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Abstract 


The Static Single Information (SSI) form is a compiler intermediate 
representation that allows efficient sparse implementations of predicated 
analysis and backward dataflow algorithms. It possesses several attractive 
graph-theoretic properties which aid in program analysis. An extension to 
SSI form, SSI, is also presented, along with a complete executable abstract 
semantics for the representation. Applications to abstract interpretation 
and hardware compilation are discussed. 

The SSI form has been implemented on the FLEX compiler infrastruc- 
ture, and it has been used to implement several analyses and optimizations. 
Details on these predicated analysis techniques are presented, as well as 


data from the practical implementation. 
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1 Introduction 


This paper introduces a compiler intermediate representation: Static Sin- 
gle Information (SSI) form. This IR is the core of the FLEX compiler 
project, which is primarily investigating intelligent compilation techniques 
for distributed systems. This thesis, in presenting the IR, attempts to keep 
both the mathematician and the programmer in mind. SSI form has both 
a rigorous mathematical semantics and a factored form which aids efficient 
implementation of advanced analyses. I believe that it effectively strad- 
dles the gap between dataflow-oriented, graph-structured, and control-flow 
driven IRs, while maintaining the sparsity needed to achieve practical effi- 


ciency. The construction algorithms are linear in the size of the program. 


Our discussion of the Static Single Information form will be at times 
tied to the source language of the FLEX compiler, Java. Unlike many ab- 
stract IRs, the choices made in the design of SSI form have been dictated by 
the necessities of compiling a real-world imperative language. Java, how- 
ever, has several theoretical properties that make program analysis more 
tractable. In particular, we mention here Java’s strict constraints on pointer 
variables. Pointers in earlier languages such as C can be abused in many 


ways that Java disallows. 


Ultimately, the choice of compiler internal representation is fundamen- 
tal. Advances in IRs translate into advances in compilers. SSI form rep- 
resents a clean and simple unification of many extant ideas, and our hope 
is that it will allow the FLEX compiler to achieve a similar integration of 


practical implementation and mathematical elegance. 


it 


2 Context and goals 


Strong et al. [40]* first advocated the use of compiler intermediate represen- 
tations in a 1958 committee report. Their idealistic “universal intermediate 
language” was called UNCOL. Thirty years later, the Static Single Assign- 
ment (SSA) form was introduced by Alpern, Rosen, Wegman and Zadeck as 
a tool for efficient optimization in a pair of POPL papers [2, 35], and three 
years after that Cytron and Ferrante joined Rosen, Wegman, and Zadeck 
in explaining how to compute SSA form efficiently in what has since be- 
come the “canonical” SSA paper [10]. Johnson and Pingali [20] trace the 
development of SSA form back to Shapiro and Saint in [37], while Havlak 
[17] views d-functions as descendants of the “birthpoints” introduced in 
[34]. 

Despite industry adoption of SSA form in production compilers [8, 9], 
academic research into alternative representations continues. Recent pro- 
posals have included Value Dependence Graphs [45], Program Dependence 
Webs [5], the Program Structure Tree [19], DJ graphs [39], and Depedence 
Flow Graphs [20]. 

In comparison to these representations, the dominant characteristics of 


our Static Single Information form may be summarized as follows: 
e It names information units. 
e It is complete. 
e It is simple. 
e It is efficient. 


e It has no explicit control dependencies. 


1 Attribution by Aho [1]. 


e It supports both forward and reverse dataflow analyses. 


SSI form is used as an IR for the FLEX compiler for the Java programming 
language, which informs some of these design decisions. The FLEX com- 
piler does deep analysis and will support hardware/software co-design. SSI 
addresses these needs, concentrating on analysis rather than optimization. 
We will address each design point in turn. 

It names information units. SSA form (which we will describe fur- 
ther in section 4) assigns unique names to unique static values of a vari- 
able. However, it ignores the value information which may be added toa 
variable at program branch points. SSI form renames variable at branch 
points, which allows us to associate unique names with unique znforma- 
tion about static values. For example, a program may test the value of an 
integer against zero before using it as a divisor. After the branch on the 
tested predicate, it is possible to make statements about values (regarding 
equality or inequality to zero) which were impossible to make previously. 
SSI form allows us to exploit this additional information. 

It is complete. By this we mean that there exists an executable se- 
mantics for the IR that does not require the use of information external to 
the IR. The original SSA form—and most derivatives—require use of the 
original program control flow graph during analysis, translation, or direct 
execution. In fact, d-functions are intimately tied with the precise input 
edge structure of the control flow graph, and switch nodes (where control 
flow splits) are undecipherable without referring to the control flow graph. 

In practice, this seems not a great disadvantage—it merely forces us to 
maintain a mapping of SSA statements to nodes (equivalently, basic blocks) 
of the original control flow graph. But maintaining this correspondence 
complicates editing the IR. Also, it complicates the interpretation of the 


program as a set of simultaneous equations, which SSI form will allow us 
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to do. Finally, explicit control flow may limit the available parallelism of 
the program. 

SSI", as it will be presented in section 7, overcomes these difficulties 
and presents a complete representation of program meaning as a set of 
simultaneous equations, without resort to graph information. 

It is simple. A bestiary of new ¢-like functions have been introduced 
in the past decade, including u-, y-, and n-functions in [5, 43], - and 7- 
functions in [24], interprocedural ¢-functions in [26], u- and x-functions in 
[9], u- and n-functions in [14],? and A-functions in [27], among others.Some 
of these are orthogonal to our work—the techniques of [24] can be used to 
extend SSI form to explicitly parallel source languages, and those of [9] 
to languages with local variable aliasing (absent in Java). Our goal is to 
achieve minimal conceptual complexity in SSI form; that is, to introduce 
the minimum set of ¢-like functions necessary to represent the “interesting” 
properties of the compiled program. 

It is efficient. Construction of SSI form should be fast, and space 
requirements should be reasonable. The original SSA algorithms required 
O(E + Vssqa!/DF| + NVssq) time.? This bound was dominated by the time 
and space required to construct the dominance frontier, as |DF|, the size 
of the dominance frontier, could be O(N?) for common cases. Taking the 
dominant term, we abbreviate the time complexity of the Cytron’s SSA- 
construction algorithm as O(N7V). 

Our algorithms do not require the construction of a dominance frontier— 
building on recent work on efficient SSA construction in this regard—and 
run in so-called “linear” time. A more detailed analysis will be given in 
section 5.4, but suffice for now to say that our construction and analysis 

?Compare to [5, 43]. 


3See section 3 for definitions of the variables used in the complexity bounds of these 


two paragraphs. 
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algorithms are efficient.* 

All explicit control dependencies are eliminated. Some researchers 
(including [4] and [32]) view control dependence as a fundamental prop- 
erty of the CFG, and [5, 4] suggest that accurate knowledge of control- 
dependence relations is the sole key to automatic parallelization. Of- 
ten, incomplete intermediate representations’ are augmented with control- 
dependence edges to express proper program semantics—see [20] on DFGs 
and [45] on VDGs, for example. 

Unfortunately, explicit control-flow edges tend to serialize computation 
more than strictly necessary. Figure 7.1 on page 75, for example, contains 
two parallel loops which would be serialized by the explicit control depen- 
dency between them. Prior work often focused on fine-grain intra-loop par- 
allelism and ignored this coarser inter-loop parallelism.® Our objective in 
this work is to fully utilize coarse parallelism by removing source-language 
control-dependency artifacts. 

It is efficient for both forward and backward dataflow analyses. 
It is often observed that traditional SSA form cannot handle backward data- 
flow analysis. Johnson and Pingali note this, and suggest anticipatability 
as an example of a backwards dataflow analysis where their dependence 
flow graph representation betters SSA form [20]. Lo et al. suggest the use 
of an “SSU” form to address much the same issue [27|. There are in fact 
many analyses where both use and definition information is utilized, and 
where dataflow in both forward and reverse directions occurs. SSI form is 
able to handle both of these cases, as we demonstrate in section 6.1. 
~ 4Dhamdhere [12] quite correctly states that Cytron’s original algorithm has a worst- 
case time bound of O(N). This is also true for our algorithms. However, these worst-case 


time bounds are not tight; we will present experimental evidence that run times on real 


programs are O(N). 
5See page 9 for our definition of “completeness” in an IR. 
®We discuss the dataflow-architecture work of Traub [42] in particular in section 7.5. 
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3 Definitions 


We next provide some definitions. Our complexity metrics will usually be 


in terms of the following variables: 


N is the number of nodes in the program control flow graph. Hach node 
represents either a single statement or a basic block; the difference is 


unimportant for complexity metrics. 


E is the number of edges in the program control flow graph. For most 
programs E is reasonably assumed to be O(N), since most nodes 
have either one or two successors (simple assignments and conditional 
branches, respectively). Unusual use of computed-goto and switch 
statements may invalidate this assumption; but in these cases E is 
generally a better metric of program “complexity” than N. For this 


reason, we will case O(E) “linear in program size”. 
V is the number of variables in the program. 
U is the total number of variable uses in the program. 


As the transformations we will describe split and rename variables, we will 
use subscripts to denote the number of variables, uses, or definitions in 
a particular transformed version of a program. For example, Ussa is the 
number of uses in the SSA form (see section 4) of a program. When it is 
necessary to explicitly denote a metric on the untransformed program, a 
zero subscript will be used; for example, Vo. 

Graphs will be directed unless specified otherwise. If X and Y are 
nodes in some graph G, an edge from X to Y is written X — Y. A path 
X = $0 9 $1... Sn = Y is written X 4 Y. A simple path is one in 


which all the nodes s; in it are distinct. 
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Control-flow graphs are assumed to be connected, and to contain unique 
START and END nodes marking procedure entry and exit points, respectively. 
To ensure that graphs representing infinite loops are connected, an edge will 
typically exist between the START and END nodes. The presence of unique 
START and END nodes ensures that both the dominance and post-dominance 
relation define trees rooted at START and END, respectively. 

For simplicity, we will assume that every node in the control-flow graph 
with one successor and one predecessor contains exactly one statement. A 
node with no predecessors and a node with no successors (START and END) 
are empty; they contain no statements. Nodes with multiple successors or 
multiple predecessors are also empty for conventional program representa- 
tions, but may contain multiple d- or o-function assignment statements in 
the SSA and SSI forms we will discuss. No node may contain both multiple 
predecessors and multiple successors. 

The symbol [ will be used for the dataflow “meet” operator. The 


operator C is the partial ordering relation for a lattice, andx CyiffxCy 


andx #y. 


4 Static Single Assignment form 


Static Single Information (SSI) form derives many features from Static 
Single Assignment (SSA) form, as described by Cytron in [10]. To provide 


context for our definition of SSI form in section 5, we review SSA form. 


4.1 Definition of SSA form 


Static Single-Assignment form is a sparse program representation in which 
each variable has exactly one definition point. As a consequence, only one 


assignment can reach each use, which means that SSA form can be viewed 
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Pe (X42) 
if P jump 


Po — (Xo #2) 
if Po jump 


¥3 — (V1, Y2) 
Z2 — (20,21) 
Y¥4 — Y34+1 
/* no further uses of X or Z */ /* no further uses of X or Z */ 


Figure 4.1: A simple program (left) and its single assignment version 
(right). 


as a type of sparse def-use chain [1]. 

For straight-line code, the SSA transformation is straightforward: each 
assignment to a variable is given a unique name (conventionally indicated 
by the use of a subscripted version of the original variable name) and each 
use is renamed to match its reaching definition. Special b-functions must 
be inserted at join points to preserve the single-assignment property. These 
c-functions have the form vo « (vj,v2) and perform an assignment ac- 
cording to the path by which control flow reaches the -function. Figure 4.1 
shows a simple program and its SSA form; the d-function Y3 — (Yj, Y2) 
in the SSA version on the right assigns Y3 the value of Y; if control flow 
reaches it along the false branch of the if statement. If the true branch is 
taken, Y3 will get the value of Y2 at the d-function. 

Formally, a program is said to be in SSA form if the following three 


conditions hold: 
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1. If two nonnull paths X +7 and YZ converge at a node Z, and 
nodes X and Y contain assignments to [a variable] V (in the origi- 
nal program), then a trivial d-function V « o(V,...,V) has been 


inserted at Z (in the new program). 


2. Each mention of V in the original program or in an inserted -function 
has been replaced by a mention of a new variable V;, leaving the new 


program in SSA form. 


3. Along any control flow path, consider any use of a variable V (in 
the original program) and the corresponding use of V; (in the new 


program). Then V and V, have the same value. 


This formulation of this definition is due to Cytron et al. [11]. Note that 
the definition does not prohibit “extra” c-functions not strictly required 


by condition 1. 


4.2 Minimal and pruned SSA forms 


Cytron et al. [11] defines minimal SSA form as an SSA form using the 
smallest number of -functions such that the above three conditions hold. 
The SSA form in the previous example (Figure 4.1 on the facing page) is 
minimal. 

A variation on minimal SSA form, called pruned form, avoids placing 
o-functions which define variables which are never used. The c-functions 
in pruned form are a subset of those in minimal form, and as such note that 
pruned form does not strictly satisfy the given SSA criteria. In most cases, 
the more regular properties of minimal SSA form outweigh the pruned 
form’s slight increase in space efficiency. Choi, Cytron, and Ferrante [7] 


give a formal definition and construction algorithm for pruned SSA. 
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Po & (Xo #2) 
if Po jump 


Po — (Xo #2) 
if Po jump 


¥3 — (V1, Y2) 
Z2 — &(Zo0,Z1) 
Y¥4 — Y34+1 Y¥4 — Y34+1 
/* no further uses of X or Z */ /* no further uses of X or Z */ 


Figure 4.2: Minimal (left) and pruned (right) SSA forms. 


Figure 4.2 compares minimal and pruned SSA form for our example 


program. 


5 Static Single Information form 


SSI form extends SSA form to achieve symmetry for both forward and 
reverse dataflow. SSI form recognizes that information about variables 
is generated at branches and generates new names at these points. This 
provides us with a one-to-one mapping between variable names and infor- 
mation about the variables at each point in the program. Analyses can then 
associate information with variable names and propagate this information 


efficiently and directly both with and against the control-flow direction. 
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Po — (Xo #2) 
if Po jump 


Po — (Xo # 2) 
if Po jump 
(X1,X2) & o (Xo) 


X3 —  (X1,X2) 
Y3 — (V1, Y2) Y3 — (Yi, Y2) 
Z2 — (Zo, 21) Z2 — (Zo, Z1) 
YY, — Y34+1 Y¥4 — Y34+1 
/* no further uses of X or Z */ /* no further uses of X or Z */ 


Figure 5.1: A comparison of SSA (left) and SSI (right) forms. 


5.1 Definition of SSI form 
Building SSI form involves adding pseudo-assignments for a variable V: 


() at a control-flow merge when disjoint paths from a conditional branch 
come together and at least one of the paths contains a definition of 
V; and 


(o) at locations where control-flow splits and at least one of the disjoint 


paths from the split uses the value of V. 


Figure 5.1 compares the SSA and SSI forms for the example of Fig- 
ure 4.1. Note that X is renamed at the conditional branch, allowing the 
compiler to distinguish between X, (which is always the constant 2) from 
X2 (which is never equal to 2). 

Formally, a program transformation to SSI form satisfies the following 


conditions: 
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. If two nonnull paths X + Z and Y + Z exist having only the node Z 
where they converge in common, and nodes X and Y contain either 
assignments to a variable V in the original program or a ¢- or o- 
function for V in the new program, then a ¢-function for V has been 


inserted at Z in the new program. [Placement of -functions. | 


. If two nonnull paths Z + X and Z 4 Y exist having only the node 
Z where they diverge in common, and nodes X and Y contain either 
uses of a variable V in the original program or a ¢- or o-function for 
V in the new program, then a o-function for V has been inserted at 


Z in the new program. [Placement of o-functions.] 


. For every node X containing a definition of a variable V in the new 
program and node Y containing a use of that variable, there exists 
at least one path X +, Y and no such path contains a definition of V 


other than at X. [Naming after -functions.| 


. For every pair of nodes X and Y containing uses of a variable V defined 
at node Z in the new program, either every path Z +, X must contain 


Y or every path Z + Y must contain X. [Naming after o-functions.] 


. For the purposes of this definition, the START node is assumed to 
contain a definition and the END node a use for every variable in the 


original program. [Boundary conditions. | 


. Along any possible control-flow path in a program being executed 
consider any use of a variable V in the original program and the 
corresponding use of V; in the new program. Then, at every occurance 
of the use on the path, V and V; have the same value. The path need 


not be cycle-free. [Correctness.| 
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As with the SSA conditions, this definition does not prohibit “extra” 


c- or o-functions not required by conditions 1 and 2. 


Property 5.1. There exists exactly one reaching definition of V at ev- 


ery non-o-function use of V in the new program. 
Proof. Offner [29] defines a reaching definition as follows: 


A definition of a variable v reaches the point P in the program 
iff there is a path from the definition to P on which... there is 


no other definition of v.... 
From this definition and condition 3 we directly obtain the property. O 


Note that condition 3 and this property do not require there to be 
exactly one definition of any variable V, just that at every use only a single 
definition is relevant. The renaming algorithm we will present enforces the 


stricter single-definition constraint. 


Property 5.2. Every cycle-free path S = ¥ from the START node to 
a node Y containing a non-db-function use of a variable must contain 
exactly one node X defining that variable in the new program. Likewise, 
every path X — E from a node X containing a non-o-function definition 
of a variable to the END node must contain every node Y which is a use 


of that variable in the new program. 


Proof. Let us call the variable v. Conditions 5 and 6 ensure that there 
exists at least one definition node X for v from which Y is reachable— 
conditions 5 and 6 substitute the START node, from which every node is 
reachable, for any use of v not reachable by some other definition in the 
original program. So assume this definition node X exists, but is not on 


the path S + Y. Then X + Y and S + Y must have some earliest node 
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N in common. But N must then have a ¢-function for v by condition 1, 
which violates either our choice of Y as a non-d-function use (if N = Y) 
or else condition 3 which prohibits definitions other than at X. If S 5 Y 
contains more than one node X; defining v, then the path Xo +, Y between 
the first and Y also violates condition 3. So S + Y must contain exactly 
one definition X of v. 

The second part is symmetric. Assume there exists some node Y using 
v which is not contained on some path X +,E. The path X +, Y must exist 
by conditions 3 and 5. And X + E and X + Y must have some final node 
N in common, which must have a o-function for v by condition 2. The 
case N = X violates the choice of X as a non-o-function definition. But if 
N # X, then condition 3, which prohibits paths with multiple definitions, 


is violated. Thus X — E must contain every use of v. O 


Property 5.3. Every definition of a variable V dominates all non-o- 
function uses of V and every use of V post-dominates any non-o- 


function reaching definition of V in the new program. 
Proof. The dominance relation is defined in Offner [29] as: 


Ifx and y are two elements in a flow graph G, then x dominates 
y (x is a dominator of y) iff every path from s [START] to y 


includes x. 


Post-dominance is the dual on a flow graph with edges reversed: x post- 
dominates y iff every path from END to y includes x. 

The previous property showed that every path from START to a non-¢- 
function use contained a unique definition node X. If two paths from START 
to Y contained different definition nodes X;, then Y would be a ¢-function, 
which it was chosen not to be. So every non-d-function use is dominated 


by the single definition node. Likewise the previous property showed that 
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every path from a non-o-function definition to END must include every use; 


therefore every use post-dominates a non-o-function definition. O 


5.2 Minimal and pruned SSI forms 


Minimal and pruned SSI forms can be defined which parallel their SSA 
counterparts. Minimal SSI form would have the smallest number of ¢- 
and o-functions such that the above conditions are satisfied. Pruned SSI 
form is the minimal form with any unused ¢- and o-functions deleted; that 
is, it contains no ¢- or o-functions after which there are no subsequent 
non-d- or o-function uses of any of the variables defined on the left-hand 
side.’ Figure 5.2 on the next page compares minimal and pruned SSI form 
for our example program. 

Note that, as in SSA form, pruned SSI does not strictly satisfy the SSI 
constraints because it omits dead d- and o-functions otherwise required by 
conditions 1 and 2 of the definition. In practice, a subtractive definition 
of pruned form — generate minimal form and then removed the unused 
c- and o-functions — is most useful, but a constructive definition can be 


generated from the standard SSI form definition as follows: 


1. The convergence/divergence node Z of conditions 1 and 2 must also 
satisfy: “and there exists a path from Z +Utoa U, a use of V in the 


original program, which does not contain another definition of V.” 


2. The boundary condition 5 at END can be loosened as follows (emphasis 
indicates modifications): “For the purposes of this definition, the 
START node is assumed to contain a definition for every variable in 


7 An even more compact SSI form may be produced by removing o-functions for which 
there are uses for ezactly one of the variables on the left-hand side, but by doing so one 
loses the ability to perform renaming at control-flow splits which generate additional value 


information. 
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Po — (Xo #2) 
if Po jump 
(X1,X2) — 9 (Xo) 


Po — (Xo # 2) 
if Po jump 
(X1,X2) — 0 (Xo) 


X3 — b (X1,X2) 
¥3 — (V1, Y2) 
Z2 — &(Zo0,Z1) 
Y¥4 — Y34+1 Y¥4 — Y34+1 
/* no further uses of X or Z */ /* no further uses of X or Z */ 


Figure 5.2: Minimal (left) and pruned (right) SSI forms. 


the original program and the END nodes a use for every variable live 


at END in the original program.” 


Pruned form is defined as having the minimal set of d- and o-functions 
that satisfy the amended conditions. It can easily be verified that the 
modifications suffice to eliminate unused ¢- and o-functions: if the variable 
defined in a d- or o-function is used, there must exist a path Z + Uas 
mandated by amendment 1, where amendment 2 lets U = END for variables 


live exiting the procedure and thus usefully defined. 


Property 5.4. A node Z gets a - or o-function for some variable V, 
in pruned SSI form only if the corresponding variable V 1s live at Z in 


the original program. 


Proof. This is a trivial restatement of amendment 1. A variable v is said to 
be live at some node N if there exists a node U using v and a path N 4 U 


on which no definitions of v are to be found. If V is not live at Z then no 
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path Z +u satisfying the amended conditions 1 and 2 can be found and 
neither a d- or o-function can be placed. Amendment 2 ensures this holds 


true at boundaries. oO 


5.3. Fast construction of SSI form 


The most common construction algorithm for SSA form [11] uses domi- 
nance frontiers and suffers from a possible quadratic blow-up in the size 
of the dominance frontier for certain common programming constructs. 
Various improved algorithms use such things as DJ graphs [38] and the de- 
pendence flow graph [20] to achieve O(EV) time complexity for -function 
placement. We build on this work to achieve O(EV) construction of SSI 
form, and present a new algorithm for variable renaming in SSI form after 
c- and o-functions are placed. 

Our construction algorithm begins with a program structure tree of 
single-entry single-exit (SESE) regions, constructed as described by John- 
son, Pearson, and Pingali [19]. We will review the algorithms involved, as 
their published descriptions [18] contain a number of errors. 


We begin with a few definitions from [19]. 


Definition 5.1. Edges a and b are said to be edge cycle-equivalent 
in a graph iff every cycle containing a contains b, and vice-versa. 
Similarly, two nodes are said to be node cycle-equivalent zff every 


cycle containing one of the nodes also contains the other. 


Definition 5.2. A SESE region in a graph G is an ordered edge pair 


(a,b) of distinct control flow edges a and b where 


1. a dominates b, 
2. b postdominates a, and 


3. every cycle containing a also contains b and vice-versa. 
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Edges a and b are called the entry and exit edges, respectively. 
Definition 5.3. A SESE region (a,b) is canonical provided 


1. b dominates b’ for any SESE region (a,b'), and 
2. a postdominates a’ for any SESE region (a',b). 


We will give time bounds in terms of N and E, the number of nodes 
and edges of the control-flow graph, respectively. Placement of - and o- 
functions is also dependent on V, the number of variables in the program. 
Since SSI renaming increases the number of variables, we will use Vo and 
Vss1 to indicate the number of variables in the original program and SSI 
form, respectively. 

Note that V is O(N) at most, since our representation only allows a 
constant number of variable definitions per node. Typically Vo will be 
much smaller than N, but Vss; need not be. Also E may be as large as 
O(N), but in most control-flow graphs is O(N) instead, as node arities are 


typically limited by a constant. 


5.3.1 Cycle-equivalency 


The identification of SESE regions begins by computing the cycle-equivalency 
of the edges in the program control flow graph. The cycle-equivalency algo- 
rithm works on undirected graphs, so we prepare the directed control flow 


graph G as follows: 


1. Add an edge from END to START in G. It is common practice to 
add an edge from START to END in order to root the control depen- 
dence graph at START [10]. However, our goal is not rooted control 
dependence but to make the control flow graph into a single strongly 
connected component; for this reason the direction of the edge is from 
END to START instead. 
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NIN AN 


Figure 5.3: Transformation from directed to undirected graph (from [18)). 


a YY 


2. Create an equivalent undirected graph. Johnson et al. prove that 
the node expansion illustrated in Figure 5.3 results in an undirected 
graph with the same cycle-equivalency properties as the original di- 
rected graph. More precisely, nodes a and b in directed graph G are 
cycle-equivalent if and only if nodes a’ and b’ are cycle-equivalent in 
transformed undirected graph G’. The nodes n; and n, generated 
by the expansion are termed not representative; the node n’ in G’ 
is said to be representative of node n in G. Obviously, this corre- 
spondence must be recorded during the transformation so we may 


properly attribute the cycle-equivalency properties of n’ to n later. 


3. Perform a pre-order numbering of nodes in G’. This is done 
with a simple depth-first search of G’. When we visit a node a; or 
do, we prefer to visit a’ before any other neighbor. This ensures that 
representative nodes are interior nodes in the DFS spanning tree. The 
START node is numbered 0, and succeeding nodes in the traversal get 
increasing numbers. Thus low-numbered nodes are closest to START 


and we will call them “highest” in the DFS spanning tree. 


The above steps form an undirected graph G’ from the control-flow 


graph G. The remainder of the cycle-equivalency algorithm is presented 
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Data type BracketList: 


create(): BracketList : Make an empty BracketList structure 
size(bl:BracketList): integer : Number of elements in BracketList structure 
push(bl:BracketList, e:bracket): BracketList : Push e on top of bl 
top(bl:BracketList): bracket : Topmost bracket in bl 
delete(bl:Bracket List, e:bracket): BracketList : Delete e from bl 
concat(bl1,bl2:BracketList): BracketList : Concatenate bl1 and bl2 


Operations on nodes: 


Number(n:node): integer : DFS preorder number of node 

NQClass(n:node): integer : Cycle-equivalency class of node 

BList(n:node): BracketList : List of brackets of node 

Hi(n:node): integer : Highest destination node of any edge originating from a 
descendant of node n 


Operations on edges: 


EQClass(e:node): integer : Cycle-equivalency class of edge 
RecentSize(e:edge): integer : Size of bracket set when e was most recently the 
topmost bracket for a representative node 


RecentClass(e:edge): integer : Cycle-equivalency class number of representa- 
tive node for which e was most recently the topmost bracket. 


Figure 5.4: Datatypes and operations for the cycle-equivalency algorithm. 
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Procedure cycle_equiv (G: CFG) 
A 
/* Preprocessing */ 
G’ := Preprocess (G); /* described in text */ 


/* Compute CD equivalence classes */ 
for each node n of G’, in reverse depth-first order, do { 
/* Compute Hi(n) */ 
/* hiO is highest using backedges only */ 
hiO := min{ Number(t) | (t,n) is a backedge }; 
/* hil is highest through children */ 
hii := min{ Hi(c) | c is a child of n }; 
| /* hi2 is lowest through children */ 
| hi2 := max{ Hi(c) | c is a child of n }; 


Hi(n) := min{ hi0, hil }; 


/* Compute BList(n) */ 
BList(n) := create (); 


for each child c of n, do 
BList(n) := concat (BList(n), BList(c)); 


for each backedge <d, n> from a descendant d of n to n, do 
BList(n) := delete (BList(n), <d, n>); 
| for each capping backedge <d, n> of n, do 
| BList(n) := delete (BList(n), <d, n>); 


for each backedge <n, a> from n to an ancestor a of n, do { 
BList(n) := push (BList(n), <n, a>) 
RecentSize(<n, a>) := -1; /* not a representative node */ 


if n has more than one child, then { 
BList(n) := push (BList(n), <n, hi2>); /* capping backedge */ 
| RecentSize(<n, hi2>) := -1; 
| add <n, hi2> to capping backedges list of hi2; 
i 


/* Compute NQClass (n) */ 
if n is a representative node, then { 
if RecentSize (top (BList(n))) != size (BList(n)), then f{ 
/* start a new equivalence class */ 
RecentSize (top (BList(n))) := size (BList(n)); 
RecentClass (top (BList(n))) := new-class-name() ; 


} 
NQClass (n) := RecentClass (top (Blist(n))); 


} /* for each node */ 


> 3 v7 
Algorithm 5.1: The cycle-equivalency algorithm (corrected from [18]). 


(START, 1) =cq (16, END) 
(1,2) =cq (8, 16) 
(2,0) Se4 oy) Seg 78) 
(4,5) =cq (5,7) 
(4,6) =cq (6, 7) 
(1,9) Seq (9,10) =cq (14, 15) =cq (15, 16) 
(10,11) =cq (11, 13) 


Figure 5.5: Control flow graph and cycle-equivalent edges. 


as Algorithm 5.1 on the preceding page, with the above procedure corre- 
sponding to the statement G’:=Preprocess(G). The algorithm has been 
corrected from the published version in [18]; in addition it has been ex- 
tended to compute both node and edge equivalencies (in effect, merging 
the algorithm of [19]). Lines modified from the presentation in [18] are 
indicated in the figure with a vertical bar in the left margin. The datatype 
BracketList and the node and edge properties used in the algorithm are 
described in Figure 5.4 on page 26. The interested reader is encouraged 
to consult [18] for additional detail on these data structures and represen- 
tations. Figure 5.5 shows cycle-equivalent regions in a simple control-flow 
graph. We use the notation (a,b) =,, (c, d) to indicate that the CFG edge 
from node a to node b is edge cycle-equivalent to the edge from node c to 
node d. 
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Calculating cycle-equivalent regions is based on a single reverse depth- 
first traversal of G, so as long as all datatype operations in Figure 5.4 can be 
completed in constant time (and [18] shows how to do so), this computation 
is O(E). 


5.3.2 SESE regions and the program structure tree 


Johnson, Pearson, and Pingali show how to construct a tree structure of 
nested SHSE regions from the cycle-equivalency information in [19]. The 
cycle-equivalent regions are sorted by dominance using a simple depth- 
first traversal of the graph, and then canonical SESE regions are found by 
taking adjacent pairs of edges from the cycle-equivalence classes. Another 
depth-first search of the CFG suffices to obtain to nesting of these regions, 
which is represented in a data structure called the program structure tree. 
The algorithm and data structures required are presented in Figure 5.6 and 
Algorithm 5.2. Figure 5.7 on page 32 shows the SESE regions on the left 
and program structure tree on the right for the example of Figure 5.5 on 
the preceding page.® 

The time complexity for constructing the PST is easily seen to be O(E). 
Algorithm 5.2 on page 31 begins with a depth first traversal of G to con- 
struct an ordered edge list for each cycle-equivalent region; the traversal is 
O(E) and the list-append operation can be done in constant time. We then 
iterate through the cycle-equivalence classes and the edge lists of each con- 
structing SESE regions. No edge can be on more than one list, so this step 
is O(E). Finally, we do a final O(E) depth-first traversal of G, performing 
the constant-time operations append and LinkRegion. All steps are O(E) 


and their sequential composition is also O(E). 


8In addition, the regions c,d,e and f,g are sequentially composed [19]. However, our 


SSI construction algorithm doesn’t use this property. 
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Data type EdgeList: 


size(el:EdgeList): integer : Number of elements in EdgeList structure 
head(el:EdgeList): edge : First edge in el 

tail(el:EdgeList): EdgeList : EdgeList like el but missing first element 
append (el:BracketList, e:edge): EdgeList : Add e to the end of el 


Data type Region: 


NewRegion(el:edge, e2:edge): Region : Creates a new region with entry el 
and exit e2 and no parent 


Entry(r:Region): Edge : The entry edge of r 

Exit(r:Region): Edge : The exit edge of r 

Parent(r:Region): Region : The parent of r, or nil if none 
Nodes(r:Region): NodeList : A list of nodes inr 
LinkRegion(r1,r2:Region): void : Sets the parent of r2 to be rl 


Operations on nodes: 


Mark(n:node): boolean : Visited status during DFS 
SESE(n:node): Region : The canonical SESE of n 


Operations on edges: 


EntryRegion(e:edge): Region : the region with entry e, or nil if none exists 
ExitRegion(e:edge): Region : the region with exit e, or nil if none exists 


Figure 5.6: Datatypes and operations used in construction of the PST. 
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NestedSHSE(G: CFG) = 

1: /* initialize */ 

2: for all nodes n of G do 

3:  Mark(n) « false 

4: for all edges e of G do 

5:  EntryRegion(e) < nil 
6:  ExitRegion(e) + nil 
7: 
8 
9 


: /* order edges within cycle-equivalency classes by dominance */ 
: for each edge e of G in depth first order do 

10: CQList (EQClass(e)) + append (CQList(EQClass(e)), e) 

11: /* get all canonical SESE regions * / 

12: for all equivalency classes q do 

13: Le CQList(q) 

14: while size(1) > 1 do 


15: r ~ NewRegion (head(1), head(tail(1))) 
16: EntryRegion(Entry(r)) <r 
1: ExitRegion(Exit(r)) <r 


18: /* determine proper nesting of SESE regions */ 
19: VisitNode(START, top-region) 


VisitNode(n: node, r: Region) = 
1: if Mark(n) = false then 
2: Mark(n) «+ true 


3: /* record mapping from n to r */ 
4. SESE(n)¢r 
5: Nodes(r) ~ append(Nodes(r), n) 
6: 
7: for each edge (n,n’) from n to n’ do 
8: r, < EntryRegion((n,n’)) 
9: r2 + ExitRegion((n,1n’)) 
10: if r=r1T, or r=12 then 
1s TN © Parent(r) /* exiting current region */ 
12: else 
13: TN —1T 
14: if r; Anil and r; #r then 
15: LinkRegion(ry,1,) /* entering new region */ 
16: TN © 1] 
17: if r2 Anil and r2 #r then 
18: LinkRegion(rn, 12) /* entering new region */ 
19: TN — 12 
20: VisitNode(n’, rn) 
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Algorithm 5.2: Computing nested SESE regions and the PST. 


Figure 5.7: SESE regions and PST for the CFG of Figure 5.5 (from [19]). 


5.3.3 Placing $- and o-functions 


As with the presentation of SSA form in [11], we split construction of 
SSI form into two parts: placing - and o-functions and renaming vari- 
ables. The placement algorithm runs in O(NVo) time, and is presented 
as Algorithm 5.3 on the next page. No new node properties or datatypes 
are required; however, it is parameterized on a function called MaybeLive. 
For minimal SSI form, MaybeLive should always return true. Faster prac- 
tical run-time may be obtained if pruned SSI form is the desired goal by 
allowing MaybeLive to return any conservative approximation of variable 
liveness information, which will allow early suppression of unused d- and 
o-functions. Note that MaybeLive need not be precise; conservative values 
will only result in an excess of d- and o-functions, not an invalid SSI form. 


Section 5.3.6 describes a post-processing algorithm to efficiently remove the 
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Place(G: CFG) = 
1: let r be the top-level region for G 
2: for each variable v in G do 
3: PlaceOne(r, v, false) /* place phis */ 
4:  PlaceOne(r, v, true) /* place sigmas */ 


PlaceOne(r: region, v: variable, ps: boolean): boolean = 
1: /* Post-order traversal */ 


2: flag + false 

3: for each child region r’ do 

4: if PlaceOne(r’, v, ps) then 

5: flag ~ true 

6: 

7: for each node n in region r not contained in a child region do 
8: if ps is false and n contains a definition of v then 
9: flag ~ true 

10: if ps is true and n contains a use of v then 

11: flag ~ true 

12: 


13: /* add phis/sigmas to merges/splits where v may be live */ 
14: if flag = true then 
15: for each node n in region r not contained in a child region do 


16: if MaybeLive(v, n) = true then 

17: if ps is false and the input arity of n exceeds 1 then 
18: place a phi function for v at n 

19: if ps is true and the output arity of n exceeds 1 then 
20: place a sigma function for v at n 

21: 


22: return flag 


Algorithm 5.3: Placing - and o-functions. 
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excess s- and o-functions.? The remainder of this section will be devoted 


to a correctness proof of Algorithm 5.3. 


Lemma 5.1. No ¢-functions (o-functions) for a variable v are needed 


in an SESE region not containing a definition (use) of v. 


Proof. Let us assume a cb-function for v is needed at some node Z inside an 
SESE not containing a definition of v. Then by condition 1 of the SSI form 
definition, there exist paths X 5 Z and Y + Z having no nodes but Z in 
common where X and Y contain either definitions of v or - or o-functions 


for v. Choose any such paths: 


Case I: Both X and Y are outside the SESH. Then, as there is only one 
entrance edge into the SHSH, the paths X +, Z and Y 4 Z must 
contain some node in common other than Z. But this contradicts our 
choice of X and Y. 


Case II: At least one of X and Y must be inside the SESH. If both X and 
Y are not definitions of v but rather d- or o-functions for v, then 
by recursive application of this proof there must exist some choice 
of X, Y, and Z inside this SESE where at least one of X and Y is a 
definition. But X or Y cannot be a definition of v because they are 


inside the SESE of Z which was chosen to contain no definitions of v. 


A symmetric argument holds for o-functions for v, using condition 2 of 
the SSI form definition, and the fact that there exists one exit edge from 
the SESE. oO 

°Note that equivalent results could be obtained by adding a ¢-function for every vari- 
able at every merge and a o-function for every variable at every split, and post-processing. 
In fact the same time bounds (O(NV,)) would be obtained. There is a large practical dif 
ference in actual runtime and space costs, however, which motivates our more efficient 


approach. 
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The above lemma justifies line 14 of the algorithm on page 33, which 


skips over any SESE region not containing a definition (use) of v when 


placing -functions (o-functions) for v. 


Lemma 5.2. If a definition (use) or a d- or o-function for a variable 


v is present at some node D (U), then a h-function (o-function) for 


v is needed at every node N: 


1. 
2. 


3. 


4. 


of input (output) arity greater than 1, 
reachable from D (from which U is reachable), 
whose smallest enclosing SHSE contains D (U), and 


which is not dominated by D (not post-dominated by U). 


Proof. We will first prove that a node N failing any one of the conditions 


does not need a ¢- or o-function. 


e Conditions 1 and 2 of the SSI form definition require node N to be 


the first convergence (divergence) of some paths X * N and Y 4 N 
(N 4 X and N + Y). If the input arity is less than 2 or there is no 
path from a definition of v, than it fails the d@-placement criterion 1. 
If the output arity is less than 2 or there is no path to a use of v, then 


it fails the o-placement criterion 2. 


If there exists a SESE containing N that does not contain any def- 
inition, - or o-function D for v, then N does not require a c- or 


o-function for v by lemma 5.1. 


Let us suppose every D; containing a definition, d- or o-function 
for v dominates N. If N requires a cd-function for v, there exist 


paths D, “+, N and D>» SN containing no nodes in common but 
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N. We use these paths to construct simple paths START 4D, 5N 
and START 4 D) 4 N. By the definition of a dominator, every 
path from START to N must contain every D;. But D; % N cannot 
contain D>, and if START +> D, contains D2, we can make a path 
START + D>» +; N which does not contain D, by using the Dj -free 
path D; +) N. The assumption leads to a contradiction; thus, there 
must exist some D; which does not dominate N if N is required to 
have a d-function for v. The symmetric argument holds for post- 


dominance and o-functions. 


This proves that the conditions are necessary. It is obvious from an exami- 
nation of conditions 1 and 2 of the SSI form definition and lemma 5.1 that 


they are sufficient. O 


In practice, the conditions of lemma 5.2 are too expensive to implement 
directly. Instead, we use a conservative approximation to SSI form, which 
allows us to place more #- and o-functions than minimal SSI requires (for 
example, a c-function for v at the circled node in Figure 5.8), while satis- 
fying the conditions of the SSI form definition. Our algorithm also allows 
us to do pre-pruning of the SSI form during placement. The result is not 
pruned SSI, but contains a tight superset of the - and o-functions that 


pruned form requires. 


Theorem 5.1. Algorithm 5.3 places all the )- and o-functions required 
by conditions 1 and 2 of the SSI form definition. 


Proof. Lemma 5.1 states that the child region exclusion of Algorithm 5.3 
does not cause required d- or o-functions to be omitted. Property 5.4 
allows the omission of d- and o-functions for v at nodes where v is dead 


when creating pruned form; MaybeLive may not return false for nodes 
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Figure 5.8: An flowgraph where Algorithm 5.3 places -functions conser- 
vatively. 


where v is not dead, but may return true at nodes where v is dead without 


harming the correctness of the - and o-function placement. O 


5.3.4 Computing liveness 


Incorporating liveness information into the creation of pruned SSI form 
appears to lead to a chicken-and-egg problem: although the pruned SSI 
framework allows highly efficient liveness analysis, obtaining the liveness 
information from the original program can be problematic. The fastest 
sparse algorithm has stated time bounds of O(E + N7) [7], which is likely 
to be more expensive than the rest of the SSI form conversion. Luckily, 
Kam and Ullman [21], in conjunction with an empirical study by Knuth 
[23], show that liveness analysis is highly likely to be linear for reducible 
flow-graphs. In our work this question is avoided, as we obtain our liveness 
information directly from properties of the Java bytecode files that are our 
input to the compiler. But in any case our algorithms allow conservative 
approximation to liveness, so even in the case of non-reducible flow graphs 


it should not be difficult to quickly generate a rough approximation. 
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Rename(G: CFG) = 
1: Init(G) 
2: for each edge e leaving START do 
3: Search(e) 


Init(G: CFG) = 
1: for each edge e in G do 
2:  Marked[e] ~ false 
3: for each variable V in G do 
4. C(V)<0 
5: € =create() /* create a new environment */ 


Inc(€: Environment, V: variable): variable = 
1:i14 C(V) +1 
2: C(v) Hi 
3: E.put(V, Vi) 
4: return V; 


Algorithm 5.4: SSI renaming algorithm. 


5.3.5 Variable renaming 


Algorithm 5.4 performs variable renaming on a flow-graph with placed 
d- and o-functions in a single depth-first traversal. When the algorithm is 
complete, the control flow-graph will be in proper SSI form. The variable 
renaming algorithm requires an Environment datatype which is defined in 
Figure 5.9. Using an imperative programming style, it is possible to per- 
form a sequence of any N operations on Environment as defined in the figure 
in O(N) time; in a functional programming style any N operations can be 
completed in O(NlogN) time.’? As the coarse structure of Algorithm 5.4 
is a simple depth-first search, it is easy to see that the Search procedure 


can be invoked from line 3 on page 38 and line 32 on page 39 a total of 


10The curious reader is referred to section 5.1 of Appel [4] for implementation details. 
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Search((s, d): edge) = 
Require: s to be a node containing d- or o-functions, or START 
Require: Marked|(s, d)] = false 
1: Marked[(s, d)] — true 
2: beginScope(€ ) 
3: if s is a node containing -functions then 
4: for each d-function P in s do 
replace the destination V of P by Inc(€, V) 
else if s is a node containing o-functions then 
for each o-function S in s do 
j + WhichSucc((s, d)) 
replace the j-th destination V of S by Inc(€, V) 
10: loop /* now rename inside basic block */ 
11: if dis anode containing ¢-functions then 


12: for each ¢-function P in d do 

13: j < WhichPred((s, d)) 

14: replace the j-th operand V of P by get(€, V) 
15: break /* end of basic block */ 

16: else if s is a node containing o-functions then 
17: for each o-function S in d do 

18: replace the operand V of S by get(€, V) 

19: break /* end of basic block */ 


20:  /* ordinary assignment, at most one successor */ 
21: for each variable V in RHS(d) do 

22: replace V by get(€, V) in RHS(d) 

23: for each variable V in LHS(d) do 

2A: replace V by Inc(€, V) in LHS(d) 

25: if d has no successor then 


26: break /* end of basic block */ 
27; std 

28: d+ successor of d 

29: end loop 


30: for each successor n of d do 

31: if not Marked|[(d,n)] then 

32: Search((d,n)) /* dfs recursion */ 
33: endScope(€) 

34: return 


Algorithm 5.5: SSI renaming algorithm, cont. 
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O(E) times; likewise its inner loop (lines 10 to 29) can be executed a total 
of E times across all invocations of Search. A total of Ussa + Dssa calls to 
the operations of the Environment datatype will be made within all execu- 
tions of Search. For the imperative implementation of Environment a total 
time bounds of O(E+ Ussa + Dssa) for the variable renaming algorithm is 
obtained. 

We have shown that Algorithm 5.3 places all the required #- and o- 
functions in the control-flow graph according to SSI form conditions 1, 2, 
and 5; we will now show that this algorithm renames variables consistent 
with conditions 3 and 4 to prove that these algorithms combined suffice to 
convert a program into SSI form. The SSI form is not necessarily minimal, 
as we showed in section 5.3.3; the next section will show how to post-process 


to create minimal or pruned SSI form. 


Lemma 5.3. The stack trace of calls to Search defines a unique path 
through G from START. 


Proof. We will prove this lemma by construction. For every consecutive 
pair of calls to Search we construct a path X > Y starting with the edge 
(X,No) which is the argument of the first call, and ending with the edge 
(Nn, Y) which is the argument of the second call. From line 28 of the Search 
procedure on page 39 we note that every edge (Ni, Ni+1) between the first 
and last has exactly one successor. Furthermore, the call to search on line 32 
defines a path starting with the edge which our segment X + Y ends with; 
therefore the paths can be combined. By so doing from the bottom of the 
call stack to the top we construct a unique path from START. O 


For brevity, we will hereafter refer to the canonical path constructed 
in the manner of lemma 5.3 corresponding to the stack of calls to Search 


when an edge e is first encountered as CP(e). Every edge in the CFG is 
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encountered exactly once by Search, so CP(e) exists and is unique for every 
edge e in the CFG. 


Lemma 5.4. SSI form condition 3 (b-function naming) holds for vari- 


ables renamed according to Algorithm 5.4. 
Proof. We restate SSI form condition 3 for reference: 


For every node X containing a definition of a variable V in the 
new program and node Y containing a use of that variable, there 
exists at least one path X +, Y and no such path contains a 
definition of V other than at X. 


We consider the canonical path CP((Y’, Y)) = START - Y’ — Y for some 
use of a variable v at Y, constructed according to lemma 5.3 from a stack 
trace of calls to Search. is encountered. This path is unique, although more 
than one canonical path may terminate at Y at nodes with more than one 
predecessor. These paths are distinguished by the incoming edge to Y."’ 
We identify each operand v; of a ¢-function with the appropriate incoming 
edge e to ensure that CP(e) is well defined and unique in the context of a 
use of v;. 

The canonical path START —> Y must contain X, a definition of v, if Y 
uses a variable defined in X, as Search renames all definitions (in lines 5, 
9, and 24) and destroys the name mapping in € just before it returns. The 
call to Search which creates the definition of v must therefore always be 
on the stack, and thus in the path CP((Y’, Y)), for any use to receive a the 
~ Note that the notation (N,N’) for denoting edges does not always denote an edge 
unambigiously; imagine a conditional branch where both the true and false case lead 
to the same label. In such cases an additional identifier is necessary to distinguish the 


edges. Alternatively, one may split such edges to remove the ambiguity. We treat edges 


as uniquely identifiable and leave the implementation to the reader. 
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name v. Note that this is true for d-functions as well, which receive names 
when the appropriate incoming edge (Y’,Y) is traversed, not necessarily 
when the node Y containing the d-function is first encountered. 

We have proved that START SY exists; now we must prove that no 
other path from X to Y contains a definition of v. Call this other definition 
D. Obviously D cannot be on our canonical path START + X + Y, or 
line 24 would have caused Y to use a different name. But as we just stated, 
all variable name mappings done by D will be removed when the call to 
Search which touched D is taken off the call stack. So D must be on the 
call stack, and thus on the canonical path; a contradiction. Since assuming 
the existence of some other path X > Y containing a definition of v leads 
to contradiction no other such path may exist, completing the proof of the 


lemma. O 


Lemma 5.5. SSI form condition 4 (o-function naming) holds for vari- 


ables renamed according to Algorithm 5.4. 
Proof. We restate SSI form condition 4 for reference: 


For every pair of nodes X and Y containing uses of a variable V 
defined at node Z in the new program, either every path Z aX 


must contain Y or every path Z +, Y must contain X. 


Let us assume there are paths Z —) X and Z + Y violating this condition; 
that is, let us chose nodes X and Y which use V and Z defining V such that 
there exists a path P; from Z to X not containing Y and a path P2 from Z to 
Y not containing X. By the argument of the previous lemma, there exists 
a canonical path P3; = CP(e) from START to X through Z corresponding to 
a stack trace of Search; note that P3 need not contain P;. There are two 


Cases: 
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Case I: P3 does not contains Y. Then there is some last node N present 
on both P>: Z 4 N 4 Y and P3 : START 5 Z 4 N 4 X. By SSI 
condition 2 this node N requires a o-function for V. If N # Z then 
line 5 of Algorithm 5.4 would rename V along P3 and X would not 
use the same variable Z defined; if N = Z, then line 9 would have 
ensured that X and Y used different names. Hither case contradicts 


our choices of X, Y, and Z. 


Case II: P3 does contain Y. Then consider the path START + Z — Y along 
P3, which does not contain X. The argument of case I applies with X 


and Y reversed. 


Any assumed violation of condition 4 leads to contradiction, proving the 


lemma. O 


Every path CP(e) corresponds to a execution state in a call to Search 
at the point where e is first encountered. The value of the environment 
mapping € at this point in the execution of Algorithm 5.4 we will denote 
as €°. For a node N having a single predecessor N, and single successor 
N,, we will denote €(N») as EN, . and E'NNs) as EN. It is obvious that 


after- 
ph ae. ante Save 


Ss 
after before after efore 


when N, and Ng, respectively, are also 


single-predecessor single-successor nodes. 


Lemma 5.6. SSI form condition 6 (correctness) holds for variables re- 
named according to Algorithm 5.4. That is, along any possible control- 
flow path in a program being executed a use of a variable V, 1n the new 
program will always have the same value as a use of the corresponding 


variable V in the original program. 


Proof. We will use induction along the path Nop — N; — ... 4 Ny. We 
consider ex, = (Nx, Nx+1), the (k+1)th edge in the path, and assume that, 
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for all j < k, each variable V in the original program agrees with the value 
of €%[V] = V, in the new program. We show that €°«[V] agrees with V at 
edge e;, in the path. 


Case I: k = 0. The base case is trivial: the START node (No) contains 
no statements, and along each edge e leaving start €°[V] = Vo. By 


definition Vo agrees with V at the entry to the procedure. 


Case II: k > 0 and Ny, has exactly one predecessor and one successor. 
If Ny is single-entry single-exit, then it is not a d- or o-function. 
As an ordinary assignment, it will be handled by lines 20 to 24 of 
Algorithm 5.5 on page 39. By the induction hypothesis (which tells 
us that the uses at N;, correspond to the same values as the uses in the 
original program) and the semantics of assignment, the mapping ane 


is easily verified to be valid when ale 


is valid. Thus the value of 
every original variable V corresponds to the value of the new variable 
ENE [V] = E&[V] on ex. 


after 


Case III: k > 0 and N, has multiple predecessors and one successor. In 
this case N, may have multiple -functions in the new program, and 
by the definition in section 3 N, has no statements in the original 
program. Thus the value of any variable V in the original program 
along edge e, is identical to its value along edge e,_1. We need only 
show that the value of the variable €°*-'[V] is the same as the value 
of the variable €°«[V] in the new program. For any variable V not 
mentioned in a d-function at N; this is obvious. Each variable defined 
in a d-function will get the value of the operand corresponding to the 
incoming control-flow path edge. The relevant lines in Algorithm 5.5 
start with 13 and 14, where we see that the operand corresponding to 


edge ex_; of a b-function for V correctly gets €°*-'[V]. At line 5, we 
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see that the destination of the ¢-function is correctly €°«[V]. Thus 
the value of every original variable V correctly correponds to €°[V] 


by the induction hyptothesis and the semantics of the d-functions. 


Case IV: k > 0 and Nx has one predecessor and multiple successors. Here 
N, may have multiple o-functions in the new program, and is empty 
in the original program. The argument goes as for the previous 
case. It is obvious that variables not mentioned in the o-functions 
correspond at e, if they did at e,_;. For variables mentioned in 
o-functions, line 18 shows that operands correctly get €°*'[V] and 
line 9 shows that the destination corresponding to e, correctly gets 
E*«|V]. Therefore the values of original variables V correspond to the 
value of €°«[V] by the induction hypothesis and the semantics of the 


o-functions. 


Case V: Nx has multiple predecessors and multiple successors. Forbidden 
by the CFG definition in section 3. 

Therefore, on every edge of the chosen path, the values of the original vari- 

ables correspond to the values of the renamed SSI form variables. The value 


correspondence at the path endpoint (a use of some variable V) follows. O 


Theorem 5.2. Algorithm 5.4 renames variables such that SSI form 
conditions 8, 4, and 6 hold. 


Proof. Direct from lemmas 5.4, 5.5, and 5.6. Ey 


Theorem 5.3. Algorithms 5.3 and 5.4 correctly transform a program 
into SSI form. 


Proof. Theorem 5.1 proves that - and o-functions are placed correctly to 
satisfy conditions 1, 2 and 5 of the SSI form definition, and theorem 5.2 
proves that variables are renamed correctly to satisfy conditions 3, 4 and 6. 

O 
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5.3.6 Pruning SSI form 


The SSI algorithm can be run using any conservative approximation to the 
liveness information (including the function MaybeLive(v,n) = true) if 
unused code elimination’? is performed to remove extra - and o-functions 
added and create pruned SSI. Figure 5.10 and Algorithm 5.6 present an 
algorithm to identify unused code in O(NVss1) time, after which a simple 
O(N) pass suffices to remove it. The complexity analysis is simple: nodes 
and variables are visited at most once, raising their value in the analysis 
lattive from unused to used. Nodes marked used are never visted. So 
MarkNodeUseful is invoked at most N times, and MarkVarUseful is invoked 
at most Vss; times. The calls to MarkNodeUseful may examine at most 
every variable use in the program in lines 3-5, taking O(Uss;) time at 
worst. Each call to MarkVarUseful examines at most one node (the single 
definition node for the variable, if it exists) and in constant time pushes at 
most one node on to the worklist for a total of O(Vss1) time. So the total 
run time of FindUseful is O(Uss; + Vss1) = O(Ussr). 


5.3.7 Discussion 


Note that our algorithm for placing - and o-functions in SSI form is 
pessimistic; that is, we at first assume every node in the control-flow graph 
with input arity larger than one requires a d-function for every variable 
and every node with out-arity larger than one requires a o-function for 
every variable, and then use the PST, liveness information, and unused 
code elimination to determine safe places to omit c- or o-functions. Most 
~ -2We follow [44] in distinguishing unreachable code elamination, which removes code 
that can never be executed, from unused code elimination, which deletes sections of 


code whose results are never used. Both are often called “dead code elimination” in the 


literature. 
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Data type Environment: 


create(): Environment : 
make an environment with no mappings. 

put(€: Environment, vj: variable, v2: variable) : 
extend environment € with a mapping from vj to v2. 


get(€: Environment, v: variable): variable : 
return the current mapping in € for v. 
beginScope(E: Environment) : 
save the current mapping of € for later restoration. 
endScope(€: Environment) : 
restore the mapping of € to that present at the last beginScope on €. 


Figure 5.9: Environment datatype for the SSI renaming algorithm. 


Operations on nodes: 


NodeUseful(n:node): boolean : Whether the results of this node are ever used 


Uses(n:node): set of variables : Variables for which this node contains a use 
Operations on variables: 


Var Useful(v:variable): boolean : Whether there is some n for which Uses(n) 
contains v and NodeUseful(n) is true 


Definitions(v:variable): set of nodes : Nodes which contain a definition for v 


Figure 5.10: Datatypes and operations used in unused code elimination. 
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FindUseful(G: CFG) = 
1: let W be an empty work list 
2: for each variable v in G do 
3: VarUseful(v) + false 
4: for each node n in G in any order do 
5:  NodeUseful(n) + false 
6: if nis a CALL, RETURN, or other node with side-effects then 
7 add n to W 
8 
9 


: while W is not empty do 
10: let n be any element from W 
11: remove n from W 
12: MarkNodeUseful(n, W) 


MarkNodeUseful(n: node, W: WorkList) = 

1: NodeUseful(n) «+ true 

2: /* everything used by a useful node is useful * / 
3: for each variable v in Uses(n) do 

4: if not VarUseful(v) then 

5 MarkVarUseful(v, W) 


MarkVarUseful(v: variable, W: WorkList) = 

1: VarUseful(v) + true 

2: /* The definition of a useful variable is useful */ 
3: for each node n in Definitions(v) do 

4: /* In SSI form, size(Definitions(v)) < 1 */ 

5: if not NodeUseful(n) then 

6 add n to W 


Algorithm 5.6: Identifying unused code using SSI form. 
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START 


END 


Figure 5.11: A worst-case CFG for “optimistic” algorithms. 


SSA construction algorithms, by contrast, are optzmistzc; they assume no 
c- or o-functions are needed and attempt to determine where they are 
provably necessary. In my experience, optimistic algorithms tend to have 
poor time bounds because of the possibility of input graphs like the one 
illustrated in Figure 5.11. Proving that all but two nodes require d- and/or 
o-functions for the variable a in this example seems to inherently require 
O(N) passes over the graph; each pass can prove that - or o-functions are 
required for only those nodes adjacent to nodes tagged in the previous pass. 
Starting with the circled node, the d- and o-functions spread one node left 
on each pass. On the other hand, an pessimistic algorithm assumes the 
correct answer at the start, fails to show that any ¢- or o-functions can be 


removed, and terminates in one pass. 


5.4 Time and space complexity of SSI form 


Discussions of time and space complexity for sparse evaluation frameworks 
in the literature are often misleadingly called “linear” regardless of what 
the O-notation runtime bounds are. A canonical example is [38], which 
states that for SSA form, “the number of cd-nodes needed remains linear.” 


Typically Cytron [11] is cited; however, that reference actually reads: 
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Linearity of uses in SSI form 
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Figure 5.12: Number of uses in SSI form as a function of procedure length. 
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Figure 5.13: Number of original variables as a function of procedure length. 
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For the programs we tested, the plot in [Figure 21 of Cytron’s 
paper] shows that the number of ¢-functions is also linear in 


the size of the original program. 


It is important to note that Cytron’s claim is based not on algorithmic 
worst-bounds complexity, but on empirical evidence. This reasoning is not 
unjustified; Knuth [23] showed in 1974 that “human-generated” programs 
almost without exception show properties favorable to analysis; in particu- 
lar shallow maximum loop nesting depth. Wegman and Zadeck [44] clearly 


make this distinction by noting that: 


In theory the size [of the SSA form representation] can be O(EV), 
but empirical evidence indicates that the work required to com- 


pute the SSA graph is linear in the program size. 


Our worst-case space complexity bounds for SSI form are identical to SSA 
form — O(EV) — but in this section we will endeavour to show that typical 
complexities are likewise “linear in the program size.” 

The total runtime for SSI placement and subsequent pruning, including 
the time to construct the PST, is O(E + NVo + Ussr). For most programs 
E will be a small constant factor multiple of N; as Wegman and Zadeck 
[44] note, most control flow graph nodes will have at most two successors. 
For those graphs where E is not O(N), it can be argued that E is the more 
relevant measure of program complexity.*? 

Thus the “linearity” of our SSI construction algorithm rests on the 
quantities NV) and Uss;. Figures 5.12 and 5.13 present empirical data 
for Vo and Uss; on a sample of 1,048 Java methods. The methods varied 
in length from 4 to 6,642 statements and were taken from the dynamic 


13We will not follow Cytron [11] in defining a new variable R to denote max(N,E,...) to 
avoid following him in declaring worst-case complexity O(R*) and leaving it to the reader 


to puzzle out whether O(N°) (!) is really being implied. 
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call-graph of the FLEX compiler itself, which includes large portions of 
the standard Java class libraries. Figure 5.12 shows convincingly that Uss; 
grows as N for large procedures, and Figure 5.13 supports an argument 
that Vo grows very slowly and that the quantity NVo would tend to grow 
as N'?. This would argue for a near-linear practical run-time. 

In contrast, Cytron’s original algorithm for SSA form had theoretical 
complexity O(E + Vssqa/DF| + NVssq). Cytron does not present empirical 
data for Vssqa, but one can infer from the data he presents for “number of 
introduced d-functions” that Vssq behaves similarly to Vssj — that is, it 
grows as N, not as Vo. It is frequently pointed out’* that the |DF| term, 
the size of the dominance frontier, can be O(N?) for common programming 
constructs (repeat-until loops), which indicates that the Vssq/DF| term 
in Cytron’s algorithm will be O(N?) at best and at times as bad as O(N?). 

Note that the space complexity of SSI form, which may be O(EV) in the 
worst case (cb- and o-functions for every variable inserted at every node) is 
certainly not greater than Uss;, and thus Figure 5.12 shows linear practical 


space use. 


6 Uses and applications of SSI 


The principle benefits of using SSI form are the ability to do predicated 
and backward dataflow analyses efficiently. Predicated analysis means 
that we can use information extracted from branch conditions and control 
flow. The o-functions in SSI form provide an variable naming that allows 
us to sparsely associate the predication information with variable names 
at control flow splits. The o-functions also provide a reverse symmetry to 


SSI form that allow efficient backward dataflow analyses like liveness and 


14See Dhamdhere [12] for example. 


52 


anticipatability. 

In this section, we will briefly sketch how SSI form can be applied 
to backwards dataflow analyses, including anticipatability, an important 
component of partial redundancy elimination. We will then describe in de- 
tail our Sparse Predicated Typed Constant propagation algorithm, which 
shows how the predication information of SSI form may be used to advan- 
tage in practical applications, including the removal of array bounds and 
null-pointer checks. Lastly, we will describe an extension to SPTC that 


allows bitwidth analysis, and the possible uses of this information. 


6.1 Backward Dataflow Analysis 


Backward dataflow analyses are those in which information is propa- 
gated in the direction opposite that of program execution [29]. There is 
general agreement [20, 7, 45] that SSA form is unable to directly handle 
backwards dataflow analyses; liveness is often cited as a canonical exam- 
ple. 

However, SSI form allows the sparse computation of such backwards 
properties. Liveness, for example, comes “for free” from pruned SSI form: 
every variable is live in the region between its use and sole definition. Prop- 
erty 5.2 states that every non-c-function use of a variable is dominated by 
the definition; Cytron [11] has shown that $-functions will always be found 
on the dominance frontier. Thus the live region between definition and use 
can be enumerated with a simple depth-first search, taking advantage of 
the topological sorting by dominance that DFS provides [29]. Because of 
o-function uses, the DFS will have to look one node past its spanning- 
tree leaves to see the c-functions on the dominance frontier; this does not 
change the algorithmic complexity. 


Computation of other dataflow properties will use this same enumera- 
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tion routine to propagate values computed on the sparse SSI graph to the 
intermediate nodes on the control-flow graph. Formally, we can say that 
the dataflow property for variable v at node N is dependent only on the 
properties at nodes D and U, defining and using v, for which there is a path 
DSU containing N. There is a “default” property which holds for nodes 
on no such path from a definition to use; for liveness the default property is 


” 


“not live.” The remainder of this section will concentrate on the dataflow 


properties at use and definition points. 


A slightly more complicated backward dataflow property is very busy 
expressions; this analysis is somewhat obsolete as it serves to save code 
space, not time. This in turn is related to partial and total anticipatabil- 


aty. 


Definition 6.1. An expression e 1s very busy at a point P of the pro- 


gram iff it is always subsequently used before it 1s killed [29]. 


Definition 6.2. An expression e is totally (partially) anticipatable at 
a point P if, on every (some) path in the CFG from P to END, there 1s 


a computation of e before an assignment to any of the variables in e 


[20]. 


Johnson and Pingali [20] show how to reduce these properties of ex- 
pressions to properties on variables. We will therefore consider properties 
BSY(v,N), ANT(v,N), and PAN(v, N) denoting very busy, totally antici- 
patable, and partially anticipatable variables v at some program point N. 

To compute BSY, we start with pruned SSI form. Any variable defined 


in a d- or o-function is used at some point, by definition. So for statements 
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at a point P we have the rules: 


Vie ae. BSYin(v, P) = false 

=v BSYin(v, P) = true 
X= O(Yo,---,Un)  BSYin(yi, P) = BSYout(x, P) 
(Xoy-++)Xn) = O(Y) BSYin(y,P) = Axio BSYout (xi, P) 


Total anticipatability, in the single variable case, is identical to BSY. 


Partial anticipatability for a variable v at point P follows the rules: 


X= OlVoyaesg Un) PAN» 
XOyeee Xn) = o(y) PAN» Y, P) => uae PA Nout (Xi, P) 


The present section is concerned more with feasibility than the mechan- 
ics of implementation; we refer the interested reader to [29] and [20] for 
details on how to turn the efficient computation of BSY, PAN and ANT 
into practical code-hoisting and partial-redundancy elimination routines, 
respectively. 

We note in passing that the sophisticated strength-reduction and code- 
motion techniques of SSAPRE [22] are applicable to an SSI-based represen- 
tation, as well, and may benefit from the predication information available 
in SSI. The remainder of this section will focus on practical implementa- 


tions of predicated analyses using SSI form. 


6.2 Sparse Predicated Typed Constant Propagation 


Sparse Predicated Typed Constant (SPTC) Propagation is a powerful anal- 
ysis tool which derives its efficiency from SSI form. It is built on Wegman 
and Zadeck’s Sparse Conditional Constant (SCC) algorithm [44] and re- 


moves unnecessary array-bounds and null-pointer checks, computes vari- 
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- 1 De Sates 


Figure 6.1: Three-level value lattice and two-level executability lattice for 
SCC. 


Table 6.1: Meet and binary operation rules on the SCC value lattice. 


able types, and performs floating-point- and string-constant-propagation 
in addition to the integer constant propagation of standard SCC. 

We will describe this algorithm incrementally, beginning with the stan- 
dard SCC constant-propagation algorithm for review. Wegman and Zadeck’s 
algorithm operates on a program in SSA form; we will call this SCC/SSA 
to differentiate it from SCC/SSI, using the SSI form, which we will describe 
in section 6.2.2. Section 6.3 on page 72 will discuss an extension to SPTC 


which does bit-width analysis. 


6.2.1 Wegman and Zadeck’s SCC/SSA algorithm 


The SCC algorithm works on a simple three-level value lattice asso- 
ciated with variable definition points and a two-level executability lattice 


associated with flow-graph edges. These lattices are shown in Figure 6.1. 
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Init(G:CFG) = 
1: Ee O 
2 Exc do 
3: for each variable v in G do 


4: if some node n defines v then 

5: Vi oH Ll 

6: else 

« Viv] + T /* Procedure arguments, etc. */ 


Analyze(G:CFG) = 
1: let r be the start node of graph G 


2: En — En U{r} 

3: Wa < {r} 

4,°W +O 

5: 

6: repeat 

7: if W, is not empty then 

8: remove some node n from W,, 

9: if n has only one outgoing edge e and e ¢ E, then 
10: RaiseE(e) 
ime Visit(n) 
12: if W, is not empty then 
13: remove some variable v from W,, 
14: for each node n containing a use of v do 
15: Visit (n) 


16: until both W,, and W,, are empty 


Algorithm 6.1: SCC algorithm for SSA form. 
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RaiseH(e:edge) = 
: /* When called, e ¢ E¢ */ 
: Ee Ee U {e} 
: let n be the destination of edge e 
if n g E, then 
En — En U{n} 
W,, — W,, U {n} 


RaiseV(v:variable, L:lattice value) = 
1: if Viv] CL then 
2, VIVyeLl 
3. WMHW,U {vy} 


Visit(n:node) = 
1: for each assignment “v + x @y” inn do 
2: RaiseV(v, V[x] © Viy]) /* binop rule: see table 6.1 */ 


3: 

4: for each assignment “v ~ MEM(...)” or “v — CALL(...)” in n do 
5: RaiseV(v, T) 

6: 

7: for each assignment “v ~— ()(x1,...,xn)” in n do 

8: for each variable x; corresponding to predecessor edge e; of n do 
9: if e, € Ee then 

10: RaiseV(v, V[v| 1 VI[xi]) /* meet rule: see table 6.1 */ 

11: 


12: for each branch “if v goto e; else e2” in n do 
13: Le Vi 


14: if L=T or L=c where c signifies “true” and e; ¢ Ee then 
15: RaiseE(e1) 
16: if L=T orL=c where c signifies “false” and e2 ¢ Ee then 
17 RaiseH(e2) 


Algorithm 6.2: SCC algorithm for SSA form, cont. 
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Associating a lattice value with a definition point is a conservative state- 


ment that, for all possible program paths, the value of that variable has a 


certain property. The value lattice is, formally, Int,; the lattice value 1 
signifies that no information about the value is known, the lattice value | 
indicates that it is possible that the variable has more than one dynamic 
value, and the other lattice entries (corresponding to integer constants and 
occuping a flat space between T and 1) indicate that the variable can 
be proven to have a single constant value in all runs of the program.’® 
Similarly, the executability lattice indicates whether it is possible that the 
control flow edge is traversed in some execution of the program (marked 
“executable” ), or if it can be proven that the edge is never traversed in any 
valid program path (marked “not executable”). The algorithm works with 
SSA form, and is presented as Algorithm 6.1. Binary operations on lattice 
values and combination at d-nodes follow the rules in Table 6.1; notice that 
the meet operation (I) is simply the least upper bound on the lattice. The 
time complexity of SCC/SSA can be found easily: the procedure RaiseE 
puts each node on the W,, worklist at most once, and RaiseV puts a variable 
on the W,, worklist at most D — 1 times, where D is the maximum lattice 
depth. The Visit procedure can thus be invoked a maximum of N times 
by line 11 of the Analyze procedure of Algorithm 6.1, and a maximum of 
Ussa(D—1) times by line 15, where Ussaq is the number of variable uses in 
the SSA representation of the program. The lattice depth D is the constant 
3 in this version of the algorithm, so it drops out of the expression. The 


RaiseE procedure itself is called at most E times. The time complexity is 


15Note that we follow the T and | conventions used in semantics and abstract interpre- 
tation; authors in dataflow analysis (including Wegman and Zadeck in their SCC paper 
[44]) often use contrary definitions, letting T mean undefined and 1 indicate overdefini- 
tion. As section 7.3 will discuss the semantics of SSI* at length, we thought it best to 


adhere to one set of definitions consistently, instead of switching mid-paper. 
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foo = f(); fooy = £(); 


if (foo == 1) if (foo 9 == 1) 
(f001,f002) = 0(f009) 
bar = foo + 1; baro = fooo + 1; 
else else 
bar = 2; bar, = 2; 


bar2 = (baro, bar; ) 


Figure 6.2: A simple constant-propagation example. 


thus O(E + N + Ussa(D — 1)}) which simplifies to O(E + Ussa). 


6.2.2 SCC/SSI: predication using o-functions. 


Porting the SCC algorithm from SSA to SSI form immediately increases 
the number of constants we can find. A simple example is shown in 
Figure 6.2: the version of the program on the right is in SSI form, and 
SCC/SSI—unlike SCC/SSA—can determine that foo. is a constant with 
value 1 (although nothing can be said about the value of foo or foo;) and 
therefore that baro, bar;, and bar, are constants with the value 2. SSI 
form creates a new name for bar at the conditional branch to indicate that 
more information about its value is known. 

Only the Visit procedure must be updated for SCC/SSI: lattice update 
rules for o-functions must be added. Algorithm 6.3 shows a new Visit 
procedure for the two-level integer constant lattice of Wegman and Zadeck’s 
SCC/SSA; with this restricted value set only integer equality tests tap the 
algorithm’s full power. The utility of SCC/SSI’s predicated analysis will 
become more evident as the value lattice is extended to cover more constant 
types. 

The time complexity of the updated algorithm is identical to that of 
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Visit(n:node) = 
1: /* Assignment rules as on page 58 */ 


N 


19: 


: for each branch “if x = y goto e; else e2 


” 


in n do 
if L[x] = T or L[y] =T then 
RaiseE(e1) 
RaiseE (ez) 
else if L[x] =c and L[y] = d then 
if c=d then 
RaiseE(e1) 
else 
RaiseE(e2) 
for each assignment “(v1,v2) © o(vo)” associated with this branch do 
if edge e; € E, and variable vo is the x or y in the test then 
RaiseV(v;, min(LIx], LIy])) 
else if edge e; € Ee then 
RaiseV(v1, Livol) 
if edge e2 € Ee then /* False branch */ 
RaiseV(v2, Livol) 


20: /* Obvious generalization applies for tests like “x Ay” */ 


Algorithm 6.3: A revised Visit procedure for SCC/SSI. 
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PA 


float double int long String 


Figure 6.3: SCC value lattice extended to Java primitive value domain. 


SCC/SSA: O(E + Ussa), by the same argument as before. 


6.2.3 Extending the value domain 


The first simple extension of the SCC value lattice enables us to represent 
floating-point and other values. For this work, we extended the domain 
to cover the full type system of Java bytecode [15]; the extended lattice is 
presented in Figure 6.3. The figure also introduces the abbreviated lattice 
notation we will use through the following sections; it is understood that 
the lattice entry labelled “int” stands for a finite-but-large set of incom- 
parable lattice elements, consisting (in this case) of the members of the 
Java int integer type. Java ints are 32 bits long, so the “int” entry ab- 
breviates 2°* lattice elements. Similarly, the “double” entry encodes not 
the infinite domain of real numbers, but the domain spanned by the Java 
double type which has fewer than 2° members.'® The Java String type is 
also included, to allow simple constant string coalescing to be performed. 
The propagation algorithm over this lattice is a trivial modification to Al- 
gorithm 6.3, and will be omitted for brevity. In the next sections, the 


“int” and “long” entries in this lattice will be summarized as “Integer Con- 


16In IREE-standard floating-point, some possible bit patterns are not valid number 


encodings. 
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Source language = depth | Max. depth 


FLEX infrastructure | Java 


javac compiler Java 
NeXTStep 3.2! Objective-C 


Objectworks 4.11 Smalltalk 
} indicates data obtained from Muthukrishnan and Miiller [28]. 


Table 6.2: Class hierarchy statistics for several large O-O projects. 


stant”, the “float” and “double” entries as “Floating-point Constant”, and 
the “String” entry as “String Constant”. As the lattice is still only three 
levels deep, the asymptotic runtime complexity is identical to that of the 


previous algorithm. 


6.2.4 Type analysis 


In Figure 6.4 we extend the lattice to compute Java type information. 
The new lattice entry marked “Typed” is actually forest-structured as 
shown in Figure 6.5; it is as deep as the class hierarchy, and the roots 
and leaves are all comparable to T and |. Only the Visit procedure must 
be modified; the new procedure is given as Algorithm 6.4. Because the lat- 
tice L is deeper, the asymptotic runtime complexity is now O(E+ UssaD,) 
where D, is the maximum depth of the class hierarchy. To form an esti- 
mate of the magnitude of D., Table 6.2 compares class hierarchy statistics 
for several large object-oriented projects in various source languages. Our 
FLEX compiler infrastructure, as a typical Java example, has an average 
class depth of 1.91.1” In a forced example, of course, one can make the class 
depth O(N); however, one can infer from the data given that in real code 
the D, term is not likely to make the algorithm significantly non-linear. 


17Measured August 2, 1999; the infrastructure is under continuing development. 
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ine 
String Floating-point Integer 
Constant Constant Constant 
BE 


Figure 6.4: SCC value lattice extended with type information. 


java.lang.Ob ject non-void primitive types 


1D. lattice 
1 levels 


java.lang.Number java.lang.String 


java.lang. Integer 


String 
Constant 


Floating-point Integer 


Nad Constant Constant 


Constant 


rf 


Figure 6.5: “Typed” category of Figure 6.4 shown expanded. 
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Visit(n:node) = 
1: for each assignment “v +x Gy” in n do 
2: RaiseV(v, V[xl © Viyl) /* binop rule: see figure 6.6 */ 
3: 
4: for each assignment “v + MEM(...)” or “v © CALL(...)” in n do 


5: let t be the type of the MEM or CALL expression 

6:  RaiseV(v, t) 

7: 

8: for each assignment “v © (x1,...,Xn)” inn do 

9: for each variable x; corresponding to predecessor edge e; of n do 

10: if e; © E. then 

11: RaiseV(v, L|,{VIv], V[xi]}) /* meet rule: use least upper bound */ 
12: 


13: for each branch “if x = y goto e else e2” in n do 

14: if Typed EC L[x] or Typed E L[y] then 

15: RaiseE(e1) 

16: RaiseE(e2) 

17: else if L[x] =c and Lly] =d then /* ifx and y are constants... */ 
18: if c=d then 


19: RaiseH(e1) 

20: else 

vale RaiseE(e2) 

22: for each assignment “(v;,v2) < o(vo)” associated with this branch do 
23: if edge e; € E, and variable vo is the x or y in the test then 

24: /* type error in source program if L|x] and L[y] are incomparable */ 
25: RaiseV(v1, min(L[x], L[y])) 

26: else if edge e; € E, then 

QT: RaiseV(v1, Livol) 

28: if edge e2 € Ee then /* False branch */ 

29: RaiseV(v2, Llvol) 

30: 


31: /* Obvious generalization applies for tests like “x Ay” */ 
32: /* Obvious generalization applies for tests like “x instanceof C” */ 


Algorithm 6.4: Visit procedure for typed SCC/SSI. 
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int @int = int 


long @{int,long} = long 


float @ {int,long,float} = float 
double @ {int, long, float,double} = double 
String © {int, long,float,double,Object,...} = String 


Figure 6.6: Java typing rules for binary operations. 


A brief word on the roots of the hierarchy forest in Figure 6.5 is called 
for: Java has both a class hierarchy, rooted at java.lang.Object, and 
several primitive types, which we will also use as roots. The primitive 


types include int, long, float, and double.'® 


Integer constants in the 
lattice are comparable to and less than the int or long type; floating-point 
constants are likewise comparable to and less than either float or double. 
String constants are comparable to and less than the java.lang.String 
non-primitive class type. 

The void type, which is the type of the expression nu11, is also a prim- 
itive type in Java; however we wish to keep x lly identical to LI {x, y} (the 
least upper bound of x and y) while satisfying the Java typing rule that 
null lx = x when x is a non-primitive type and not a constant. This 
requires putting void comparable to but less than every non-primitive leaf 
in the class hierarchy lattice. 

The Java class hierarchy also includes interfaces, which are the means 


by which Java implements multiple inheritance. Base interface classes 


18In the type system our infrastructure uses (which is borrowed from Java bytecode) 


the char, boolean, short and byte types are folded into int. 
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+ 


Typed 


Non-null Typed 


Fixed-length Integer Floating-point String Null 
Array Constant Constant Constant Constant 


Figure 6.7: Value lattice extended with array and null information. 


(which do not extend other interfaces) are additional roots in the hierarchy 
forest, although no examples of this are shown in Figure 6.5. 

Since untypeable variables are generally forbidden, no operation should 
ever raise a lattice value above “Typed” to |. The otherwise-unnecessary 
T element is retained to indicate error conditions. 

This variant of the constant-propagation algorithm allows us to elim- 
inate unnecessary instanceof checks due to type-casting or type-safety 
checks. Section 6.2.6 will provide experimental validation of its utility. 

Finally, note that the ability to represent null as the void type in the 
lattice begins to allow us to address null-pointer checks, although because 
null l1x = x for non-primitive types we can only reason about variables 
which can be proven to be null, not those which might be proven to be 
non-null (which is the more useful case). The next section will provide a 


more satisfactory treatment. 


6.2.5 Addressing array-bounds and null-pointer checks 
At this point, we can expand the value lattice once more to allow elim- 
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YC € Class, Cron-nutt © Cpossibly-null 

VC € Clas8non-nutt, Li {void, C} € Classpossiply-null 
VC € ClasSpossibly-nul, Void CF C 

VC € Classpon-nu, (void, C) ¢ CE 


Let A(C,n) be a function to turn a lattice entry representing a non-null 
array class type C into the lattice entry representing a said array class with 
known integer constant length n. Then for any non-null array class C and 


integers i and j, 


ACCA eC 
(A(C,i), A(C,j)) EC if and only ifi=j 


Figure 6.8: Extended value lattice inequalities. 


x= 5 #65 
do { 
y = new intLlx]; 
z= x-1; 
if (0 <= z && z < y.length) 
ylz] = 0; 
else 
saa 
} while (P); 


Figure 6.9: An example illustrating the power of combined analysis. 
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Visit(n:node) = 
1: /* Binop and ¢-function rules as in algorithm 6.4 * / 


2: 

3: for each assignment “v ~+ MEM(...)” or “v © CALL(...)” inn do 
4: let t € Classpossibly-nul! U Classprimitive be the type of the MEM or CALL 
5:  RaiseV(v, t) 

6: 

7: for each array creation expression “v ~ new T[x]” do 

8: if L[x] is an integer constant then 

9: RaiseV(v, A(T, L[x])) 

10: else 

11: RaiseV(v, Tnon-null) 

12: 


13: for each array length assignment “v « arraylength(x)” do 
14: if L[x] is an array of known constant length n then 


1b: RaiseV(v, n) 
16: else 

17: RaiseV(v, int) 
18: 


19: /* Branch rules as in algorithm 6.4, with the obvious extension to allow tests 
against null to lower a lattice value from Classpossibly-nul] tO ClasSpon-null- */ 


Algorithm 6.5: Visit procedure outline with array and null information. 
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if (10 < 0) 
throw new NegativeArraySizeException() ; 
int{] A = new int[10]; 
if (0 < 0 || 0 >= A.length) 
throw new ArrayIndexOutOfBoundsException() ; 
BLOT Sia5 
for (lint ~is13, a-< 107 aay 
if (i < 0 II i >= A.length) 
throw new ArrayIndexOutOfBoundsException() ; 
ALi] = 0; 


Figure 6.10: Implicit bounds checks (underlined) on Java array references. 


ination of unnecessary array-bounds and null-pointer checks, based on our 
constant-propagation algorithm. The new lattice is shown in Figure 6.7; we 
have split the “Typed” lattice entry to enable the algorithm to distinguish 


19 and added a lattice level for 


between non-null and possibly-null values, 
arrays of known constant length. Some formal definition of the new value 
lattice can be found in Figure 6.8; the meet rule is still the least upper 
bound on the lattice. Modifications to the Visit procedure are outlined 
in Algorithm 6.5. Notice that we exploit the pre-existing integer-constant 
propagation to identify constant-length arrays, and that our integrated ap- 
proach allows one-pass optimization of the program in Figure 6.9. 

Note that the variable renaming performed by the SSI form at control- 
flow splits is essential in allowing the algorithm to do null-pointer check 
elimination. However, the lattice we are using can remove bound checks 
from an expression A[k] when k is a constant, but not when k is an bounded 


18Values which are always-null were discussed in the previous section; they are identified 


as having primitive type void. 
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induction variable. In the example of Figure 6.10 on the preceding page, 
the first two implicit checks are optimized away by this version of the 
algorithm, but the loop-borne test is not. 

A typical array-bounds check (as shown in the example on the pre- 
ceding page) verifies that the index i of the array reference satisfies the 
condition 0 < i < n, where n is the length of the array.” By identifying 
integer constants as either positive, negative, or zero the first half of the 
bounds check may be eliminated. This requires a simple extension of the 
integer constant portion of the lattice, outlined in Figure 6.11 on the facing 
page, with negligible performance cost. However, handling upper bounds 
completely requires a symbolic analysis that is out of the current scope 
of this work. Future work will use induction variable analysis and inte- 
grate an existing integer linear programming approach [36] to fully address 


array-bounds checks. 


6.2.6 Experimental results 


The full SPTC analysis and optimization has been implemented in the 
FLEX java compiler platform.?! Some quantitative measure of the utility of 
SPTC is given as Figure 6.12. The “run-times” given are intermediate rep- 
resentation dynamic statement counts generated by the FLEX compiler SSI 
IR interpreter. The FLEX infrastructure is still under development, and its 
backends are not stable enough to allow directly executable code. As such, 
the numbers bear a tenuous relation to reality; in particular branch delays 
on real architectures, which the elimination of null-pointer checks seeks to 
eliminate, are unrepresented. Furthermore, the intermediate representa- 
tion interpreter gives the same cycle-count to two-operand instructions as 


20Languages in which array indices start at 1 can be handled by slight modifications to 


the same techniques. 
21See section 8 for details of methodology. 
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Figure 6.11: An integer lattice for signed integers. A classification into 
negative (M), positive (P), or zero (Z) is grafted onto the standard flat 
integer constant domain. The (M-P) entry is duplicated to aid clarity. 


to loading constants, which tends to negate most of the benefit of constant 
propagation. As is obvious from the figure, the standard Wegman-Zadeck 
SCC algorithm, which has proven utility in practice, shows no improvement 
over unoptimized code due to the metric used. Even so, SPTC shows a 10% 
speed-up. It is expected that the improvement given in actual practice will 


be greater. 


Note that the speed-up is constant despite widely differing test cases. 
The “Hello world” example actually executes quite a bit of library code 
in the Java implementation; this includes numerous element-by-element 
array initializations (due to the semantics of java bytecode) which we expect 
SPTC to excel at optimizing. But SPTC does just as well on the full FLEX 
compiler (68,032 lines of source at the time the benchmark was run), which 


shows that the speed-up is not limited to constant initialization code. 
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Figure 6.12: SPTC optimization performance. 


6.3. Bit-width analysis 


The SPTC algorithm can be extended to allow efficient bit-width analy- 
sis. Bit-width analysis is a variation of constant propagation with the goal 
of determining value ranges for variables. In this sense it is similar to, but 
simpler than, array-bounds analysis: no symbolic manipulation is required 
and the value lattice has N levels (where N is the maximum bitwidth of 
the underlying datatype) instead of 2‘. For C and Java programs, this 
means that only 32 levels need be added to the lattice; thus the bit-width 
analysis can be made efficient. 

Bit-width analysis allows optimization for modern media-processing in- 
struction set extensions which typically offer vector processing of limited- 
width types. Intel’s MMX extensions, for example, offer packed 8-bit, 16- 
bit, 32-bit and 64-bit vectors [30]. To take advantage of these functional 


units without explicit human annotation, the compiler must be able to 
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) ( 
) ( 
r) = (max(M;, + P,, Pi + M,),max(M, + M,, Pi + P,)) 
) (0, min(P,, P,-)) 

) (max(M_,, M,),max(Pi, P,)) 


Figure 6.13: Some combination rules for bit-width analysis. 


guarantee that the data in a vector can be expressed using the limited 
bit-width available. A simpler bit-width analysis in a previous work [3] 
showed that a large amount of width-limit information can be extracted 
from appropriate source programs; however, that work was not able to in- 
telligently compute widths of loop-bound variables due to the limitations 
of the SSA form. Extending the bitwidth algorithm to SSI form allows 


induction variables width-limited by loop-bounds to be detected. 


Bit-width analysis is also a vital step in compiling a high-level language 
to a hardware description. General purpose programming languages do not 
contain the fine-grained bit-width information that a hardware implemen- 
tation can take advantage of, so the compiler must extract it itself. The 


work cited showed that this is viable and efficient. 


The bit-width analysis algorithm has been implemented in the FLEX 
compiler infrastructure. Because most types in Java are signed, it is neces- 
sary to separate bit-width information into “positive width” and “negative 
width.” This is just an extension of the signed value lattice of Figure 6.11 to 
variable bit-widths. In practice the bit-widths are represented by a tuple, 


extending the integer constant lattice with (Int x Int), under the natural 
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total ordering of Int. The tuple (0,0) is identical to the constant 0, and the 
tuple (0,16) represents an ordinary unsigned 16-bit data type. The T ele- 
ment is represented by an appropriate tuple reflecting the source-language 
semantics of the value’s type. Figure 6.13 presents bit-width combina- 
tion rules for some unary negation and binary addition, multiplication and 
bitwise-and. In practice, the rules would be extended to more precisely 


handle operands of zero, one, and other small constants. 


7 An executable representation 


The Static Single Information (SSI) form, as presented in the first half 
of this thesis, requires control-flow graph information in order to be exe- 
cutable. We would like to have a demand-driven operational semantics for 
SSI form that does not require control-flow information; thus freeing us to 
more flexibly reorder execution. 

In particular, we would like a representation that eliminates unnecessary 
control dependencies such as exist in the program of Figure 7.1 on the next 
page. A control-flow graph for this program, as it is written, will explicitly 
specify that no assignments to B[] will take place until all elements of A[] 
have been assigned; that is, the second loop will be control-dependent on 
the first. We would like to remove this control dependence in order to 
provide greater parallelism—in this case, to allow the assignments to A[] 
and B[] to take place in parallel, if possible. 

In addition, an executable representation allows us to more easily apply 
the techniques of abstract interpretation [31]. Although abstract interpre- 
tation may be applied to the original SSI form using information extracted 
from the control flow graph, an executable SSI form allows more concise 


(and thus, more easily derived and verified) abstract interpretation algo- 
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for (int i=0; i<10; i++) 


Ati] = x; 
for (int j=0; j<10; j++) 
BLj] = y; 


Figure 7.1: An example of unnecessary control dependence: the second 
loop is control-dependent on the first and so assignments to A[] and B[] 
cannot take place in parallel. 


rithms. 

The modifications outlined here extend SSI form to provide a useful and 
descriptive operational semantics. We will call the extended form SSI*. 
For clarity, SSI form as originally presented we will call SSIp. We will 
describe algorithms to contruct SSI* efficiently, and illustrate analyses and 


optimizations using the form. 


7.1 Deficiencies in SSIp 


Although a demand-driven execution model can be constructed for SSlo, it 
fails to handle loops and imperative constructs well. SSI* form addresses 


these deficiencies. 


7.1.1 Imperative constructs, pointer variables, and side-effects 


The presentation of SSIp ignored pointers, concentrating on so-called regis- 
ter variables. Extending SSIp to handle these imperative constructs is quite 
easy: we simply define a “variable” S to represent an updatable store. This 
variable is renamed and numbered as before, so that So represents the initial 
contents of the store and S;,i > 0 represents the contents of the store after 


some sequence of writes. Figure 7.2 shows a simple imperative program in 
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// swap Afi] and Bj] | // SSIt form: 


x = ACil; Xo = FETCH(So, Ap + io) 
y = BLjl; Yo = FETCH(S9, Bo + jo) 
ALi] = y; S; = STORE(So, Ao +io, Yo); 
B[j] = x; S2 = STORE(S;, Bo+jo, Xo); 


Figure 7.2: Use of the “store variable” S, in SSI* form. 


SSI* form. Note that modifications to the store typically take the previous 
contents of the store as input, and that subroutines with side-effects mod- 
ifying the store must be written in SSI* form such that they both take a 
store and return a store. 

The single monolithic store may provide aliasing at too coarse a resolu- 
tion to be useful. Decomposing the store into smaller regions is a straight- 
forward application of pointer analysis, which may benefit from an initial 
conversion of register variables to SSI, form. In type-safe languages, defin- 
ing multiple stores for differing type sets is a trivial implementation of basic 
pointer analysis; Figure 7.3 shows a simple example of this form of decom- 
position using two different subtypes (Integer and Float) of a common 
base class (Number). Pointer analysis is a huge and rapidly-growing field 
which we cannot attempt to summarize here; suffice to say that the may- 
point-to relation from pointer analysis may be used to define a fine-grained 
model of the store. 

Proper sequencing among statements with side-effects may be handled 
in a similar way: a special SSI name is used/defined where side-effects occur 
to impose an implicit ordering. For maximum symmetry with the ‘store’ 
case, we will name this special variable S**. This variable may be further 
decomposed using effect analysis for more precision. 


Note that precise analysis of side-effects and the store is much more 
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t+ N: Number, 1: Integer, F : Float 
IcCNandFCN 


if (P) // SSIt form: 
N=I; 
else No = (Io, Fo) 
N=F; 
F.add(3.14159) ; S} = CALL(add, S$}, Fo, 3.14159) 
N.add(5) ; (si, $5) = CALL(add, $5, $1, No, 5) 


Figure 7.3: Factoring the store (S,) using type information in a type-safe 


language. 


important in C-like languages. The example on the left in Figure 7.4 shows 
the difficulties one may encounter in dealing with pointer variables that 
may rewrite SSI temporaries. It is possible to deal with this in the manner 
of Figure 7.3 using explicit stores, and with sufficient analysis one may write 
the SSI representation on the right in the figure. The source language for 
our FLEX compiler does not encounter this difficulty: Java has no pointers 
to base types, and so the compiler does not have to worry about values 


changing “behind its back” as in the example. 


7.1.2 Loop constructs 


The center column of Figure 7.5 on page 79 shows a typical loop in SSI 
form. Note first that an explicit “control flow” expression (goto L1) is 
required in order to make sense of the program. Note also that 14, i2 and 
i3 are potentially dynamically assigned many times, although statically 
they have only one definition each. This complicates any sort of demand- 


driven semantics: should the d-function demand the value of io, or 13, 
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int x=1; X= 


int. y=2; Ue 2 
int *p = &x; Po = {x} // P ts of type “location set” 
it: CP) 
p = ky; pi ={y} 
p2 = (Po, Pi) 
#p = 3; (x1, U1) = DEREF(p2, 3) 
return x; return x] 


Figure 7.4: Pointer manipulation of local variables in C. 


when it is evaluated the first time? Which of the values of i3 does it receive 
when the ¢-function is subsequently evaluated? A token-based dataflow 
interpretation fails as well: it is easy to see that tokens for i, flow around 
the loop before flowing out at the end, but the token for jo seems to be 


“used up” in the first iteration. 


SSI* introduces a &-function in the block of -functions to clarify the 
loop semantics. The left-hand column of Figure 7.5 illustrates the nature of 
this function. The &-function arbitrates loop iteration, and will be defined 
precisely by the operational semantics of SSI* form. For now note that 
it relates iteration variables (the top tuple of the parameter and result 
vectors) to loop invariants (the bottom tuple of the vectors). We followed 
the statement ordering of SSI in the figure, but unlike SSI, the statements 
of SSI* could appear in any order without affecting their meaning—and so 
the statement label L1 of the SSIp representation and its implicit control- 


flow edge are unnecessary in SSI*. 
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// a simple loop // SSIg form: // SSI* form: 


ig jo=l jo=l 
i=0; io =0 eso, 
do Li: Liss] = 8(L653]) 
t 1 = O(io, tz) ti = Olio, ts) 
i+=j; ig =i1 + jo b=uUuth 
| while (i<5); Po = (tn < 5) Po =a <5) 
if Po goto Li (13,14) = o(Po, t2) 


(i3,t4) = o(i2) 


Figure 7.5: A simple loop, in SSIp and SSI* forms. 


7.2 Definitions 


The signature characteristic of SSI* are the &-functions. These &-functions 
exist in the same places d-functions do, and control loop iteration. The 
exact semantics may vary—the sections below present two different valid 
semantics for a &-functions—but informally they can be viewed as “time- 
warp” operators. They take values from the “past” (previous iterations of 
the loop or loop invariants valid when the loop began) and project them 
into the “future” (the current loop iteration). 

There is at most one €-function per d-function block, and it always 
precedes the c-functions. Construction of &-functions takes place before 
the renaming step associated with SSI form, and the €-functions are then 
renamed in the same manner as any other definition. The top tuple of 
the constructed &-function contains the names of all variables reaching 
the guarded d-function via a backedge, and the bottom tuple contains 
all variables used inside the guarded loop that are not mentioned in the 


header’s c-function. 
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The SSI* form also has triggered constants. The time-oriented se- 
mantics of SSI* dictate that each constant must be associated with a trigger 
specifying for what times (cycles/loop iterations) the value of the constant 
is valid. These are similar to the constant generators in some dataflow 
machines [42]. The triggers for a constant c come from the variables de- 
fined in the earliest applicable instruction post-dominated by the constant 
definition statement v = c. This is designed to generate the trigger as 
soon as it is known that the constant definition statement will always ex- 
ecute. In practice it is necessary to introduce a bogus trigger variable, 
C+ which is generated at the START node and used to trigger any constants 
otherwise without a suitable generator. If the use of the constant does not 
post-dominate the START node, Cy will have to be threaded through o- and 


o-functions to reach the earliest post-dominated node. 


7.3 Semantics 


We will base the operational semantics of SSI* on a demand-driven data- 
flow model. We will define both a cycle-oriented semantics and an event- 
driven semantics, which (incidentally) correspond to synchronous and asyn- 
chronous hardware models. 

Following the lead of Pingali [31], we present Plotkin-style semantics 
[33] in which configurations are rewritten instead of programs. The con- 
figurations represent program state and transitions correspond to steps in 
program execution. The set of valid transitions is generated from the pro- 
gram text. 


The semantics operate over a lifted value domain V = /nt,. When 


some variable t = | y we say it is undefined; conversely t 1 | y indicates 
that the variable is defined. “Store” metavariables S, are not explicitly 


handled by the semantics, but the extension is trivial with an appropriate 
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redefinition of the value domain V. Floating-point and other types are also 
trivial extensions. The metavariables c and v stand for elements of V. 

We also define a domain of variable names, Nam = {no,n1,...}. The 
metavariables t and P stand for elements in Nam, although P will be re- 
served for naming branch predicates. 

A fixed set of “built-in” operators, op, is defined, of type V' — V. If any 
operator argument is |, the result is also |. Constants are implemented 
as a special case of the general operator rule: an op producing a constant 


has a single trigger input which does not affect the output. 


7.3.1 Cycle-oriented semantics 


In the cycle-oriented semantics, configurations consist of an environment, 


po, which maps names in Nam to values in V. 
Definition 7.1. 


1. An environment 9: N — V 1s a finite functton—its domain N C 
Nam 1s finite. The notation p[t > c] represents an environment 


identical to p except for name t which 1s mapped to c. 
2. The null environment pg maps everyt EN to Ly. 


8. A configuration consists of an environment. The initial config- 
uration is pglCt — 0] extended with mappings for procedure pa- 
rameters. That 1s, all names in N are mapped to ly except for 
the default constant trigger Ct mapped to 0,72 and any procedure 


parameters mapped to their proper entry values. 


Figure 7.6 on the next page shows the cycle-oriented transition rules for 


SSI* form. The left column consists of definitions and the right column 


? Any k J | y would do. 
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Figure 7.6: Cycle-oriented transition rules for SSI*. 


shows a precondition on top of the line, and a transition below the line. 
If the definition in the left column is present in the SSI* form and the 
precondition on top of the line is satisfied, then the transition shown below 


the line can be performed. 


7.3.2 Event-driven semantics 


In the event-driven semantics, configurations consist of an event set and an 
invariant store. The event set E contains definitions of the form t = c, and 
the invariant store is a mapping from numbered é-functions in the source 
SSI* form to a set of tuples representing saved values for loop invariants. 


We define the following domains: 
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t= (t,...,th) (E[t; =v],S) > (E[t = vl, S) 

ae je) = o(P,t) (E[t =v][P =i], S) > (E[t; =v], S) 
ot) (E[t; =v],S) 

eee ip aaa (Elty =v), SIK 9 SIKU (ti, 


where 1 <i<n 


ns STK = tv) tm vat 
(Eg = verilise (val, Ss) 
(Elty =Vv1]... [tm = Vail) 


Figure 7.7: Event-driven transition rules for SSI*. In the last two rules K is 
a statement-identifier constant which is unique for each source &-function. 


e Hut = Nam~ V is the event domain. An event consists of a name- 


value pair. The metavariable e stands for elements of Lvt. 


e Xif C Int is used to number é-functions in the source SSI* form. 
There is some mapping function which relates €-functions to unique 


elements of Xzf. The metavariable K stands for an element in Xz 
A formal definition of our configuration domain is now possible: 
Definition 7.2. 


1. An event set E: Evt*. The notation E[t =c] represents an event 
set tdentical to E except that it contains the event (t,c). We say a 
name t is defined if (t,v) € E for somev. For all (t1, v1) , (tz, v2) € 


E, t; and tz differ. This 1s equivalent to saying that no name t 1s 
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multiply defined in an event set. This constraint 1s enforced by 


the transition rules, not by the definition of E. 


2. An invariant store S : Xif > Evt* zs a finite mapping from é- 


functions to event sets. 


8. A configuration zs a tuple (E,S) : Evt* x (Xif — Evt*) consisting of 
an event set and an invariant store. The initial configuration for 
procedure parameters po,...,Pn mapped to non- values Vo,...,Vn 
as ({(Ct = 0, Po =Vo,---,Pn = Vn}evt, xitsnvex) that ts, 2t consists of 
an empty event set extended with events for default constant trig- 
ger Cr and the procedure parameters, and an empty mapping for 


the invariant store. 


Figure 7.7 on the preceding page shows the event-driven transition rules 
for SSI* form. As before, the left column consists of definitions and the 
right column shows an optional precondition above a line, and a transition. 
If the definition in the left column is present in the SSI* form and the 
precondition (if any) above the line is satisfied, then the transition can be 
performed. Note that most transitions remove some event from the event 
set E, replacing it with a new event. The invariant store S stores the values 


of loop invariants for regeneration at each loop iteration. 


7.4 Construction 


Construction of SSI* is only a slight variation on the construction algo- 
rithms for SSIp. First, dominator and post-dominator trees are produced 
using the Lengauer-Tarjan [25] or Harel [16] algorithm. The nodes of 
the dominator tree are numbered in pre-order such that for all nodes N, 
num[N] > num[idom[N]]. Then, in a single traversal of the post-dominator 


tree, we find the lowest-numbered node post-dominated by any given node. 


85 


We add triggers to constants from variables defined at this lowest node 
post-dominated by the constant use; using the default trigger C; where 
necessary. We then place d- and o-functions for all variables, including 


constant triggers, using Algorithm 5.3. 


We then generate €-functions. A standard interval analysis creates a 
loop nesting tree, and each loop is scanned for invariants and other defini- 
tions/uses to create the proper ¢-function tuples. Renaming is done using 


Algorithm 5.4, as before. 


7.5 Dataflow and control dependence 


The SSI* semantics are data-driven, and thus bring to mind work on com- 
pilers for dataflow machines. Beck, Johnson, and Pingali have previously 
written [6] on the benefits of dataflow-oriented intermediate representa- 
tions. However, the previous work on dataflow compilers (Traub [42], for 
example) has concentrated on intra-loop dependencies, often leaving in 
pseudo-control-flow edges to serialize non-loop structures. This strategy 
results in the sort of fine-grain intra-loop parallelism suitable for parallel 


dataflow machines, vector processors, and VLIW machines. 


The current work concentrates on removing unnecessary dependencies 
between loops, which allows a coarser parallelism which does not require 
as many functional units to take advantage of. Moreover, we extract par- 
allel sequential threads that are not loop-based. Obviously both fine-grain 
and coarse-grain parallelism are important, but we feel the current indus- 
try trends towards loosely coupled multiprocessors support our coarser- 
grained approach which has, to date, seemingly been neglected by dataflow 


approaches. 
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7.6 Hardware compilation. 


The observant reader may have noticed that the two operational semantics 
given in section 7.3 closely resemble circuit implementations for the pro- 
gram according to synchronous and asynchronous design methodologies. 
In fact, SSI* was designed specifically to facilitate rendering a high-level 
program into hardware. The two semantics differ primarily on how cyclic 
dependencies (i.e. loops) are handled. 

Translation of high-level languages directly to hardware has long been 
a goal of researchers. Tanaka et al. constructed a system based on FOR- 
TRAN [41], and Galloway’s C-based hardware description language [13] 
inspired a new interest in applying general-purpose languages to the task. 
The recent general use of type-safe object-oriented languages has encour- 
aged speculation that the more favorable analysis properties of these stricter 
languages would enable further advances in general-use hardware compila- 
tion. In this context, the well-defined semantics and data-flow orientation 
of SSI* solve the local-level hardware compilation problem and allow effort 


to be concentrated on the more difficult intra-procedural analyses required. 


8 Methodology 


The SSI intermediate representation described in this paper is the core IR 
for the FLEX compiler infrastructure project, started in July 1998 and 
currently containing about 70,000 lines of Java source code. The FLEX 
compiler reads in Java bytecodes, and targets both the JVM (for high-level 
portable code transformations) and several combinations of machine archi- 
tectures and runtime systems. Currently the bytecode and ARM processor 
backends are near completion. Interpreters exist for the various interme- 


diate representations used in the compiler, allowing the correctness of the 
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earlier passes of the compiler to be verified. The compiler will correctly 
compile itself to IR and interpret itself. 

The FLEX compiler implements the algorithms described in this paper, 
validating their correctness. Variable counting for the graphs of section 5.4 
was done by a special statistics module that could be applied to the results 
of any pass. The full bitwidth-extended SPTC constant propagation al- 
gorithm was implemented, although we currently do not use the bitwidth 
information produced. SSI* and hardware compilation are the focus of 


current work. 


9 Conclusions 


The Static Single Information form extends SSA without adding unneeded 
complexity to allow efficient predicated analysis and backward dataflow 
analyses. Futher, the SSI* variant removes all explicit control-dependence 
relations, allowing extraction of parallelism from the code, and possesses a 
complete and straight-forward semantics which makes it useful for, among 
other things, abstract interpretation and hardware compilation. 

We have demonstrated efficient construction of SSI form, and several 
optimizations which use it to obtain efficiency improvements over previous 
methods. The many SSA-variant papers in the literature attest to limi- 
tations of standard SSA form; we believe SSI form solves these problems 
in a simple and symmetric manner. The FLEX compiler infrastructure 


demonstrates the practicality of SSI form. 
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